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Chapter 1 

Introduction 


Progress in qubit technology requires accurate and reliable methods for qubit charac¬ 
terization. There are several reasons characterization is important. One is diagnostics. 
Qubit operations are susceptible to various types of errors due to imperfect control 
pulses, qubit-qubit couplings (crosstalk), and environmental noise. In order to improve 
qubit performance, it is necessary to identify the types and magnitudes of these errors 
and reduce them. Another reason is the desirability to have metrics of gate quality that 
are both platform independent and provide sufficient descriptive power to enable the as¬ 
sessment of qubit performance under a variety of conditions. Metrics are also necessary 
to ascertain if the requirements of quantum error correction (QEC) are being met, i.e., 
whether the gate error rates are below a suitable QEC threshold. Eull characterization 
of quantum processes provides detailed information on gate errors as well as metrics of 
qubit quality such as gate fidelity. 

Several methods of qubit characterization are currently available. In chronological or¬ 
der of their development, the main techniques are quantum state tomography ((^T) 1|, 
quantum process tomography (QPT) [^,y], randomized benchmarking (RB) [l,y,S, 7|, 
and quantum gate set tomography (GST) [a, [^. All of these tools, with the exception of 
GST, have been well studied and systematized, and have gained widespread acceptance 
and use in the quantum computing research community. 

GST grew out of QPT, but is somewhat more demanding in terms of the number 
of experiments required as well as the post-processing. As we will see, obtaining a 
GST estimate involves solving a highly nonlinear optimization problem. In addition, 
the scaling with system size is polynomially worse than QPT because of the need to 
characterize multiple gates at once. Approximately 80 experiments are required for a 
single qubit and over 4,000 for 2 qubits to estimate a complete gate set, compared to 16 
and 256 experiments respectively to reconstruct a single 1- or 2-qubit gate with QPT. 
Methods for streamlining the resource requirements for GST are under investigatioij_JlO , 
Hi. Nevertheless, single-qubit GST has been demonstrated by several groups 


and 2-qubit GST is widely believed to be achievable. Importantly, these groups have 
convincingly shown that GST outperforms QPT in situations relevant to fault-tolerant 


quantum information processing (QIP) 


As a result, it seems clear that GST in 
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either its present or some future form will supercede QPT as the only accurate method 
available to fully characterize qubits for fault-tolerant quantum computing. In this 
document, we present GST in the hopes that readers will be inspired to implement it 
with their qubits and explore the possibilities it offers. 

GST arose from the observation that QPT is inaccurate in the presence of state 
preparation and measurement (SPAM) errors. It will be useful to classify SPAM errors 
into two different types, which we will call intrinsic and extrinsic. Intrinsic SPAM 
errors are those that are inherent in the state preparation and measurement process. 
One example is an error initializing the |0) state due to thermal populations of excited 
states. Another is dark counts when attempting to measure, say, the |1) state. Extrinsic 
SPAM errors are those due to errors in the gates used to transform the initial state to 
the starting state (or set of states) for the experiment to be performed. In QPT, the 
starting states must form an informationally complete basis of the Hilbert-Schmidt space 
on which the gate being estimated acts. These are typically created by applying gates 
to a given initial state, usually the |0) state, and these gates themselves may be faulty. 

Intrinsic SPAM errors are of particular relevance to fault-tolerant quantum comput¬ 
ing, since it turns out that QEC requirements are much more stringent on gates than on 
SPAM. According to recent results from IBM 10(], a 50-fold increase in intrinsic SPAM 


error reduces the surface code threshold by only a factor of 3-4. Therefore QPT - the 
accuracy of which degrades with increasing SPAM - would not be able to determine if 
a qubit meets threshold requirements when the ratio of intrinsic SPAM to gate error is 
large. 

This is not an issue for extrinsic SPAM errors, which go to zero as the errors on the 
gates go to zero. Nevertheless, extrinisic SPAM error interferes with diagnostics: as an 
example, QPT cannot distinguish an over-rotation error on a single gate from the same 
error on all gates (see Sec. l4.4.T]l . In addition, Merkel, et al. have found that, for a broad 
range of gate error - including the thresholds of leading QEC code candidates - the ratio 
of QPT estimation error to gate error increases as the gate error itself decreases @]. This 
makes QPT less reliable as gate quality improves. 

Extrinsic SPAM error is also unsatisfactory from a theoretical point of view: QPT 
assumes the ability to perfectly prepare a complete set of states and measurements. In 
reality, these states and measurements are prepared using the same faulty gates that 
QPT attempts to characterize. One would like to have a characterization technique that 
takes account of SPAM gates self-consistently. We shall see that GST is able to resolve 
all of these issues. 

Another ^moach to dealing with SPAM errors is provided by randomized bench¬ 
marking 0, iH0- RB is based on the idea of twirling [l^, [l^ - the gate being 
characterized is averaged in a such a way that the resulting process is depolarizing with 
the same average fidelity as the original gate. The depolarizing parameter of the aver¬ 
aged process is measured experimentally, and the result is related back to the average 
fidelity of the original gate. This technique is independent of the particular starting 
state of the experiment, and therefore is not affected by SPAM errors. However, RB has 
several shortcomings which make it unsatisfactory as a sole characterization technique 
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for fault-tolerant QIP. For one thing, it is limited to Clifford gate^, and so cannot be 
used to characterize a universal gate set for quantum computing. For another, RB pro¬ 
vides only a single metric of gate quality, the average fidelityH This can be insufficient 
for determining the correct qubit error model to use for evaluating compatibility with 
QEC. Several groups have shown that qualitatively different errors can produce the same 
average gate fidelity, and in the case of coherent errors the depolari zing channel inferred 
from the RB gate fidelity underestimates the effect of the error 1^, 19, 20|. Finally, RB 
assumes the errors on subsequent gates are independent. This assumption fails in the 
presence of non-Markovian, or time-dependent noise. GST suffers from this assumption 
as well, but the long sequences used in RB make this a more pressing issue. 

Despite these apparent shortcomings, RB has been used with great success b y se veral 
groups to measure gate fidelities and to diagnose and correct errors 21, HSH- RB 
also has the advantage of scalability - the resources required to implement RB (number 
of experiment^ processing time) scale polynomially with the number of qubits being 
characterized [g, [^. QPT and GST, on the other hand, scale exponentially with the 
number of qubits. As a result, these techniques will foreseeably be limited to addressing 
no more than 2-3 qubits at a time. In our view, GST and RB will end up complementing 
each other as elements of a larger characterization protocol for any future multi-qubit 
quantum computer. We will not discuss RB further in this document. 

This document reviews GST and provides instructions and examples for workers who 
would like to implement GST to characterize their qubits. The goal is to provide a guide 
that is both practical and self-contained. For simplicity, consideration is restricted to 
a single qubit throughout. We begin in Ghapter [5] with a review of the mathematical 
background common to all characterization techniques. This includes the representation 
of gates as quantum maps via the process matrix and Pauli transfer matrix, the superop¬ 
erator formalism, and the Ghoi-Jamiolkowski representation, which allows the maximum 
likelihood estimation (MLE) problem to be formulated as a semidefinite program (SDP). 
Ghapter [3] begins with a review of quantum state and process tomography, the charac¬ 
terization techniques that underlie GST. We then continue to GST itself, including a 
derivation of linear-inversion gate set tomography (LGST) following Ref. [9|. Although 
LGST is weaker than MLE since it does not in general provide physical estimates, it 
is useful both as a starting point for the nonlinear optimization associated with MLE, 
and also in its own right as a technique for getting some information about the gate set 
quickly and with little numerical effort. This is followed by a description of the MLE 
problem in both the process matrix 25] and Pauli transfer matrix formulations. 
MLE is the standard approach for obtaining physical gate estimates from tomographic 
data. In Ghapter H] we continue with GST, presenting the detailed experimental protocol 
as well as numerical results implementing the ideas of Ghapter [3l The implementation 
utilizes simulated data with simplified but realistic errors. This data was designed to 


^For recent work discussing generalizations of RB to non-Clifford gates, see Refs. fDI. [ir|. 

^Recently, generalizations of RB have been proposed to measure leakage rates [1^ and the coher¬ 
ence [13 of errors. These advances have the promise to turn RB into a more widely applicable charac¬ 
terization tool. 


5 












incorporate the important properties of actual data including coherent and incoherent 
errors and finite sampling noise. Using this simulated data, we compare the performance 
of QPT and GST. 
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Chapter 2 

Mathematical preliminaries 


In this chapter we review the mathematical background on quantum operations and 
establish notation that we will use later on. 


2.1 Quantum operations 


Quantum operations are linear maps on density operators p —)• A(/9), that take an arbi¬ 
trary initial state p G in some Hilbert space H and output another state A(p) G H. 
Although the formalism we present is applicable to arbitrary (d-level, or qudit) quantum 
systems, we will restrict ourselves to systems of qubits, i.e. = Tin, where Tin is some 
2”-dimensional (n-qubit) Hilbert space. This is done in the interest of concreteness, and 
because most experiments involve qubit systems. Thus, for example, we will explicitly 
use the familiar qubit basis of Pauli operators rather than a more general qudit basis. 

For such a map to describe a physical process, two basic requirements must be met: 
(1) for any initial state of the universe (system -|- environment), the hnal state after 
application of the map must have nonnegative probabilities for measuring the eigenstate 
of any observable (in other words, the density matrix of the universe is always positive 
semidefinite), and (2) total probability must be conserved^ Requirement (1) is known 
as Complete Positivity (CP) and (2) is called Trace Preservation (TP), since the total 
probability of all eigenstates for the state p is Tr{p}. A physical map is then a CPTP 
map. 

It turns out that a general CP map on ii^t the system of interest (the environment 
having been traced out) can be written as 28l | 


N 


Aip) = ^K,pKj, 


( 2 . 1 ) 


2=1 


for some N < (P, where d = 2"' is the Hilbert space dimension. This is called the Kraus 

^Requirement (2) is relaxed in the presense of leakage errors (transitions outside the qubit subspace). 
In this case, the total probability must not increase. For an interesting discussion of non-trace preserving 
maps, see Ref. [j^ . 
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representation. The Ki are Kraus operators, and need not be unitary, hermitian, or 
invertible. The map is also TP if Y^iKjKi = I, where I denotes the identity!! This is 
called the completeness condition. 


2.1.1 Examples of quantum operations 

Eq. (12.ip contains unitary evolution as a special case: N = l,Ki G SU{d). It also 
describes non-unitary processes, such as depolarization and amplitude damping, that 
can represent coupling to an environment (such as an external electromagnetic field). 
Below are some examples of the Kraus representation for a single qubit that describe 
familiar quantum processes. 


1. Depolarization - This operation replaces the input state with a completely mixed 
state with probability p, and does nothing with probability I — p. It describes a 
noise process where information is completely lost with probability p. For a single 
qubit, depolarization is given by the quantum map [i^ . 

Adep(p) = - Ip) P + l + YpY + ZpZ). (2.2) 

The Kraus operators are a/I — 2>p/AI., y^X/2, ^Z/2|l 

2. Dephasing - This process can arise when the energy splitting of a qubit fluctuates 
as a function of time due to coupling to the environment. Charge noise affecting 
a transmon qubit is of this type. Dephasing is represented by a phase-flip chan¬ 
nel [ 2 ^, which describes the loss of phase information with probability p. This 
channel projects the state onto the K-axis of the Bloch sphere with probability p, 
and does nothing with probability 1 — p: 

Az(p) = (l-|)p+|^p^. (2.3) 

Here the Kraus operators are a/I — p/2/, yfpj2Z. 

3. Spontaneous Emission / Amplitude Damping - This process describes energy loss 
from a quantum system, the standard example being spontaneous emission from 
an atom [^. The Kraus operators are 


Ko = 


1 0 
0 


Ki 


0 Vp \ 

0 0 y ■ 


(2.4) 


^To see this, we take the trace of Eq. dni and rearrange operators inside the trace: Tr{A(p)} = 
Tr{5I)i-KjiGip}. By trace preservation, Tr{A(p)} = Tr{p}. We therefore have TrfJ^^ if)ffip} = Tr{p} 
for any p, which can only be true if Yli EjKi = I. 

X, Y, Z are the single-qubit Pauli operators 


I = 


1 0 
0 1 


X = 


0 1 
1 0 




0 

-1 
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Ki changes 11) to 10), corresponding to the emission of a photon to the environment. 
Kq leaves |0) unchanged but reduces the amplitude of |1), representing the change 
in the relative probabilities of the two states when no photon loss is observed. 
Note that the form of Kq is fixed by Ki, the completeness condition, and the 
requirement that Kq not change the amplitude of the |0) state. 


It is worth noting that at no point did we explicitly invoke the time dependence of 
quantum maps written in terms of Kraus operators. In fact, the time dependence is 
implicitly built in, through the parameter p appearing in the examples above. Starting 
from Eq. m, and under the assumptions of local, Markovian noise, we can derive 
the Lindblad equation for the time evolution of the system density matrix, p. The 
Lindblad operators are related to the Kraus operators via Ki = VtLi. (See for example. 


Ref. 29| Eq. (4.62).) As a result, p oc t, as we would expect from Eermi’s golden rule 


for transitions induced by coupling to an environment. We do not worry about explicit 
time evolution in this document, and therefore use Eq. m rather than the Lindblad 
equation, assuming implicitly that quantum operations take a hnite time t. 

For a given quantum operation, the Kraus operators are not unique [^. They can 
be fixed by expressing them in a particular basis of Tin- A typical choice is the Pauli 
basis, R®”-, where V = {I,X,Y, Z}. We follow this convention here. 


2.1.2 Process matrix 

The Pauli basis leads to a useful representation of a quantum map: the process matrix, 
or y-matrix. Expanding Ki in terms of Pauli operators, we obtain Ki = 

Pj G R®”. Inserting this expression into Eq. (12.ip gives 

mp) = E XjkPj pPki ( 2 - 5 ) 

j,k=l 

where Xjk = X a d? x d? complex-valued matrix, d = 2”. The y-matrix 

completely determines the map A. From its dehnition in terms of the a-coefficients, we 
find that y is Hermitian and positive semidehnite0 In addition, we have ~ 

^ijkO-ikPkttijPj = Ypjk{'Liaija*i,)PkPj = HjkXjkPkPj- Thus for a TP map, the com¬ 
pleteness condition reads Yljk XjkPkPj = P Hence, the y-matrix for a physical (CPTP) 
map has d^(d^ — 1) free parameters. (A d? x d? complex hermitian matrix has free 
parameters, and the completeness condition adds d^ constraints.) 

^Indeed, Xkj = ~ y is Hermitian. Also, for any -dimensional complex vector 

V, we have = Y.jk^*jXjkVk = Y.ijki°‘ii'^7)i°‘ikVk) = Ei > 0- Therefore y is positive- 

semidefinite. 
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Examples 


1. Depolarization - The depolarizing channel, Eq. (j2.2l) . is already written in the form 
of Eq. (12.51) . We can simply read off the coefficients of the process matrix, 

/ 1 - 3p/4 0 0 0 \ 

0 p/4 0 0 

^ “ 0 0 p/4 0 ■ 

\ 0 0 0 p/4 / 

2. Dephasing ~ As for depolarization, Eq. (|2.3p is already written in the form of 
Eq. (|2.5I) . The process matrix is, 

/ 1 - p/2 0 0 0 \ 

0 0 0 0 

0 000' 

\ 0 0 0 p/2 / 


3. Rotations - A rotation by angle 0 about axis h applied to a state p is given by 
^e{p) = For simplicity, let us choose the rotation axis to be the 

z-axis. Expanding = cos(0/2) — in ■ asm{9/2), and setting h = z, we get 



1 + cos(0) 
0 
0 


\ —isin(0) 


0 0 zsin(6() \ 

0 0 0 

0 0 0 

0 0 1 — cos(9) ) 


4. Spontaneous Emission / Amplitude Damping - Expanding the Kraus operators for 
amplitude damping, Eq. (Eai), in terms of the Paulis, we find 


X = 


0 
0 

p/2 


0 0 

p/2 —ipjl 
ip/2 p/2 
0 0 


p/2 

0 

0 

(1 - y/l-pf 


\ 


2.1.3 Pauli transfer matrix 

Another useful representation of a quantum map is the Pauli transfer matrix (PTM): 

= ^Tr{PiA(P,)}. (2.6) 

The PTM has several convenient properties. Since A(p) is a density matrix, it can 
be expanded in the {Pi} with real coefficients in the interval [-1,1]. (This is true for 
any Hermitian operator.) Inspection of the right-hand side of Eq. dMl) then shows 
that all entries of the PTM must also be real and in the interval [-1,1]. Also, as we 
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show in Section E21 the PTM of a composite map is simply the matrix product of the 
individual PTMs. This makes it easy to evaluate the result of multiple gates acting in 
succession. The process matrix does not have this property. We note that R is generally 
not symmetric. 

In the PTM representation, the trace preservation (TP) constraint has a particularly 
simple form. Since Tr{Pj} = (5oj, we must have Tr{A(Pj)} = (5oj. But Tr{A(Pj)} = 
Tr{Po^(^i)} = Therefore the TP constraint is {RA)oj - ^Oj- In other words, 

the first row of the PTM is one and all zeros. 

Compete positivity on the other hand is not expressed as simply in the PTM repre¬ 
sentation as it is for the y-matrix. We discuss this more in Sec. 12.31 

Another sometimes useful condition that is conveniently expressible via the PTM 
is unitality. A unital map is one that takes the identity to the identity. Physically, 
this is a map that does not increase the purity of a state, i.e. it does not ‘unmix’ a 
mixed state. Many quantum operations of interest, including unitary maps, are unital. 
One interesting exception is amplitude damping, as we will see in the examples below. 
Mathematically, unitality is expressed as A(/) = I. Inserting this into the definition of 
the PTM, Eq. (12.611 . we find {RA)io = Tr{PjA(/)} = Tr{PiI} = Sio- Therefore the PTM 
for a unital map must have one and all zeros in its hrst column. 

The PTM and process matrix representations are equivalent, and one can express one 
in terms of the other. Substituting Eq. (12.51) into Eq. (|2.6I) with p = Pj gives {RA)ij = 
2 '^ki Xki T^{PiPkPjPi}- (Note that p = Pj is not a physical density operator: Tr{Pj} ^ 
1. But A(Pj) is still formally defined.) The inverse transformation (expressing y in terms 
of R) cannot be written as a compact analytical expression but can be implemented 
straightforwardly for any particular PTM using symbolic or numerical software. 

Examples 

Using the above definitions, we can write down the PTMs for the examples in Sec. 12.1.21 
(See that section for definitions of p and 9.) 

1. Depolarization 

/ 1 0 0 0 \ 

_ 0 1 — p 0 0 

“00 l-p 0 

V 0 0 0 l-p J 

2. Dephasing 

/I 0 0 0 \ 

_ 0 1 — p 0 0 

“00 l-p 0 ■ 

\ 0 0 0 1 / 
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3. Rotation by 9 about z-axis 


/I 0 0 0 \ 

_ 0 cos(0) — sin(0) 0 

0 sin(0) cos{6) 0 

V 0 0 0 1 / 

4. Spontaneous Emission / Amplitude Damping 

/ 1 0 0 0 \ 

P _ 0 y/l-p 0 0 

“00 y/T^ 0 

\ p 0 0 1 — p J 

These examples illustrate the convenient structure of the PTM. Note in particular 
examples 3 and 4. For rotations, the PTM reduces to block-diagonal form consisting 
of the 1-dimensional identity matrix and a 3 x 3 rotation matrix. This rotation matrix 
determines the transformation of the 1-qubit Bloch vector under the rotation map. For 
amplitude damping, the form of the PTM is much simpler than that of the corresponding 
X-matrix and makes the non-unitality apparent. 

From the PTMs, we can see by inspection that depolarization, dephasing, and rota¬ 
tions are all trace preserving and unital. Amplitude damping, however, is trace preserv¬ 
ing but not unital (the first column of the PTM has a non-zero entry beyond the first 
entry). Physically, amplitude damping corresponds to spontaneous emission from the 
|1) to the |0) state. Therefore, any mixed state tends towards the pure state |0) at long 
enough times. (This is in fact the procedure typically used experimentally to “reset” a 
qubit to the |0) state - waiting long enough for spontaneous emission to occur.) 

We note in passing that invertible quantum maps form a group. It is therefore natu¬ 
ral to look for group representations, since this allows quantum maps to be represented 
as matrices, and map composition as the product of representation matrices. Such group 
representations are called superoperators. We saw above that PTMs in fact have the 
properties we just described. Therefore PTMs form a superoperator group representa¬ 
tion, based on the Pauli matrices. We study this representation below, and show how it 
can be used to simplify the description of quantum maps. 

2.2 Superoperator formalism 

Manipulations with quantum maps can be conveniently carried out using the superopera¬ 
tor formalism [3fll |. In this formalism, density operators p on a Hilbert space of dimension 
d are represented as vectors |p)) in Hilbert-Schmidt space of dimension d?. Quantum 
operations (linear maps on density operators) are represented as matrices of dimension 
X d^. The Hilbert-Schmidt inner product is defined as (( A\B )) = Tr{AfB}/d, where 
A, B are density operators. As we shall see, defining the inner product in this way 
enables map composition to be represented as matrix multiplication. It also allows us 
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to make contact with measurements. The expected value p of a POVM E for a state p 
is p = Tr{£'/9}J1 

Although superoperators can be defined relative to any basis of Tin, we continue to 
find it convenient to use the Pauli basis introduced above. From this point on, we use 
rescaled Pauli operators Pi —>• Pi/y/d. This way the basis is properly normalized, and 
we avoid having to write factors of d everywhere. 

We represent states p as state vectors \p)) = \k)) {{k\p)), with components 

{{k\p)) = Tv{Pkp}, (2.7) 

Pk € {//^/2,A:/V2,y/^/2,Z/V2}®”. (2.8) 

This is simply a restatement of the completeness of the Pauli basis: p = Ylk TrjPfcp} 
for any p, which implies that the operator Tr{Pfc ■ } equals the identity. In super¬ 

operator notation, this is Ylk 1^)) ((^1 = from which Eq. (12.71) follows. 

Quantum operations are represented as matrices, 

Rk = Eli)>((i|JiAl*»((i|. (2.9) 

jk 

{{j\RA\k)) = TT{PjA{Pk)}. (2.10) 

Note that {{j\R\\k)) is just the PTM, {RA)jk- (The 1/d factor was suppressed by our 
rescaling.) 

From these definitions, it is straightforward to derive the following consequences: 

|A(p))) = Ra\p)), ( 2 . 11 ) 

Ra2oAi = Ra2Rai- ( 2 - 12 ) 

Proof. To prove Eq. (|2.11l) . we first use the completeness relation and then Eq. (12.71) . 

|A(p)» = E I*--» «» = E I*--» rV{PiA(p)). 

k k 

Next, we expand p = PjP^{PjP/ and insert into the right-hand-side of the above 
equation, giving 

|A(p))) = j;|A:))TV{PfcA(P,)}Tr{P,p}. 

jk 

®POVM stands for Positive Operator-Valued Measure, and denotes a formalism for describing mea¬ 
surements. A POVM E is also called a measurement operator and is used to represent the average 
outcome of measuring a state p, through the equation p = Tr{i?p}. POVMs are hermitian, positive 
semidefinite, and satisfy the completeness relation Ei = I, where i enumerates the possible exper¬ 
imental outcomes. In this document we are concerned primarily with the special case of projective 
measurements, for example E = |0)(0|, and use the POVM formalism simply as a matter of notational 
convenience. For this reason, we do not discuss POVMs in any detail, and refer the reader to Ref. [2^ . 
Ch. 2.2.6 for a complete treatment. 
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Substituting Tr{PfcA(Pj)} = {{k\RA\j)) (see Eq. (|2.1Up i andTr{Pjp} = {{j\p)) (Eq. (|2.7|) 
again) we obtain 

jk 

Einally, we eliminate the sums over j, k using the completeness relation, thus proving 
Eq. (iflTD . 

To prove Eq. (I2.12D . we act with i?A 2 oAi on an arbitrary state \p)), and use Eq. (|2.11l) . 

-RAzoAi I/3)) = IA 2 (Ai (p)) )). 

Using Eq. (I2.1ip again, we find 

IA2 (Ai(p)))) = RaMp))) = Ra.RaAp)) • 

This proves Eq. (I2.12D . □ 

In this way, the composition of quantum maps has been expressed as matrix multi¬ 
plication. We now have a simple way to write down experimental outcomes. Consider 
the following experiment: prepare state p, perform quantum operation A, measure the 
POVM E. Repeat many times and calculate the average measured value, m. The ex¬ 
pected value, p = E(m), of the measurement m is the quantum expectation value, which 
can be written as p = Tr{EA(p)} = {{E\Ra\p))■ 

2.3 Physicality constraints 

As mentioned in Sec. I2.ll a physical map must be completely positive and trace pre¬ 
serving (CPTP), which is equivalent to the following two conditions: (1) for any initial 
state of the universe, the final state after application of the map must have nonnegative 
probabilities for measuring the eigenstate of any observable (in other words, the density 
matrix of the universe is always positive semidefinite), and (2) total probability must be 
conserved. The mathematical statement of these requirements is different for the process 
matrix and PTM representations of a quantum map, and will be used later on to write 
down the constraints for the optimization problem for the gate set. 

2.3.1 Process matrix representation 

We have seen that hermiticity and positive semidefiniteness of the y matrix follow from 
the Kraus representation, which (as we stated without proof) is a consequence of com¬ 
plete positivity. Also, we used the Kraus representation to show that trace preservation 
implies the completeness condition, YlijXijPjPi = I- Since the process matrix is so 
central to our work, it is interesting to see how these constraints follow directly from the 
CPTP requirement and Eq. (123]), which is a general way of writing any quantum map. 
We now show this. 

Since the CP condition must hold for any initial state, we consider initial states of 
the form p = |(/>)((/>| ®/, where \(j)) is a pure state of the system, or qubit, and I refers to 
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the “rest of the universe”. We can neglect reference to the “rest of the universe” since it 
will be traced out of all formulas in which the map A acts only on the system. Therefore 
we omit the <SiI below. 

The probability of observing the pure state j'i/’) of the system after application of the 
map A to the initial state |</>) is 

= Tv{A{\(l)){(l)\)\ip){ip\} = {^p\A{\(l)){(j)\)\ip). (2.13) 

Using Eq. we obtain 

Am{ci,\) = Y,x^Jpmm■ ( 2 - 14 ) 

Inserting this into Eq. (|2.13D we get 

(2-15) 

ij 

Since p.^ is real, we hnd upon setting p^ = p^ and interchanging the dummy indices i,j 
in Eq. (12.15^ . that = X- 

Next, let us define a vector v with components Vj = {(f)\Pj\^lJ). Then Eq. (I2.15P 
becomes p^ = Since the probability p^ > 0 and IV’), |<?!>) are arbitrary, the matrix 

X must be positive semidefinite. 

Finally, the completeness condition is proved just like for the Kraus representation. 
Trace preservation means Tr{A(/9)} = Xij P'^iPjPiP} = Tr{p}. Since this must hold 
for any p, the completeness condition follows. 

Although we have only proved that these conditions on x are necessary for CPTP 
(we only considered a restricted set of initial states), it turns out they are also sufficient. 
We use this in the formulation of the constraints for the optimization problem in Ch. [3l 
It is also clear now that the Kraus representation can be viewed as a special case 
of a general x-rspresentation, where the process matrix is the identity. Indeed, the CP 
constraint implies that x niust have real and positive eigenvalues. Starting from Eq. (j2.5p 
for an arbitrary (not necessarily Pauli) basis Pi, we can diagonalize x and absorb the 
(real and positive) eigenvalues into the definition of the Ki, resulting in Eq. (|2.ip . 


2.3.2 PTIVI representation 


As we saw above, the TP constraint in the PTM representation is {RA)oj = 5oj- The 
CP constraint, however, cannot be expressed in terms of the PTM directly. Instead, 
one typically uses the fact 31| that the Choi-Jamiolkowski (CJ) matrix 33| associ¬ 
ated with a CP map is positive semidefinite. The CJ matrix is a density matrix - like 
representation of a quantum map that lives in a tensor product of two Hilbert-Schmidt 
spaces. For a quantum map A it can be written in terms of the PTM as 


PA = 


1 


*j=i 




R. 


(2.16) 
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In terms of the CJ matrix, the CP constraint is pA 0, where the curly inequality 
denotes positive-semidefiniteness. See Ref. [31[ for further discussion. 
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Chapter 3 

Derivation of GST 


In this chapter, we derive GST and describe some of the relevant data analysis techniques. 
In the following chapter, we present the experimental and data analysis protocols in step- 
by-step fashion. The reader interested in getting right to the implementation may want 
to skip ahead to that chapter. 

GST can be viewed as a self-consistent extension of quantum process tomography 
(QPT), which itself grew out of quantum state tomography (QST). We therefore begin 
with a brief review of QST and QPT. 

3.1 Quantum state tomography 

Quantum state tomography [l| attempts to characterize an unknown state p by measur¬ 
ing its components (( k\p)), usually in the Pauli basis. It can be more convenient to use a 
different basis of measurement operators Ej, j = 1,..., that span the Hilbert-Schmidt 
space S(«j)0 Generally, one measures the f probabiliUes 

Pi ^ ((Ej\p)) ({k\p)) . (3.1) 

k 

The matrix Ajk = {{Ej\k)) is assumed known, since the {{Ej\ were chosen in advance 
by the experimenter, and the |A:)) are Pauli basis vectors. 

The probability pj can be estimated as the sample average mj = 

N single-shot measurements of Ej, where rriij = 0 or 1 is the outcome of the Tth 
measurement. Since pj is the true probability, the expected value of rrij is E(mj) = 
E{mij)/N = Pj, and the variance is Var(mj) = pj{l — pj)/N. The sample average 
rrij approaches the true value pj as N ^ oo. 

Since Ajj^ is known, Eq. (|3.ip can be solved by matrix inversion, \p)) = A~^\m)), 
where \m)) is the vector of measurements rrij and p is the linear inversion estimator for 
p. The only requirement is that Ej be chosen such that A is invertible. 

^We here use the notation Hd instead of T-Ln as in Ch. [2] to keep the discussion momentarily more 
general. For n qubits, d = 2". 
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Estimation is particularly simple if \Ej )) = \j )). Then no matrix inversion is neces¬ 
sary and 1/5)) = |m)). (Technically mo should be set to IjVd, so that Tr{Pop} = !•) 

3.2 Quantum process tomography 

Quantum process tomography 00 is similar to QST, but now the goal is to characterize 
a gate G rather than a state. This is done by measuring the process matrix y or the 
PTM, (( k\G\l )), via measurements of the d‘^ probabilities 

Pij — {{^j\^\Pi)) ■ (3-2) 

Here the Ej are as before, and the set of vectors \pi )) is a complete basis for the Hilbert- 
Schmidt space. Since we must have Tr{p} = 1 and the Pauli matrices are traceless, 
we cannot choose \pi)) = IQ). Therefore, each \pi)) must be chosen as some linear 
combination of Pauli basis vectors that can be inverted to give the PTM. Inserting 
complete sets of states in Eq. 

P^J = Y.{{E,\k)){{k\G\l)){{l\p,)). (3.3) 

kl 

As before, the vectors (( Ej\, \pi )) are specified by the experimenter, and so the matrices 
{{Ej\k)), {{l\pi)) are assumed known. We can arrange the product of these two matrices 
into a single matrix, = {{Ej\k)) {{l\pi)). Also, we can vectorize the 

PTM, defining = Gki- Then, Eq. ()3.3p becomes 

p = Sf. (3.4) 

Given a vector of measurements m of the probabilities p, this equation can be inverted 
to yield an estimate of the PTM: 

f=S~^m. (3.5) 

In the discussion so far, we have assumed S is full-rank. {{Ej}, {pi} each form 
a complete basis, and S is x d^.) The estimation can be improved by including 
an overcomplete (> d^) set of states and measurements. Then S has more rows than 
columns and cannot be inverted, but S can be. In this case, we obtain the ordinary 
least-squares estimate as 

r = {S^S)~^S^m. (3.6) 

In practice QST and QPT estimates are obtained using maximum likelihood estima¬ 
tion rather than linear inversion, since this allows physicality constraints to be put in. 
We discuss MLE in the context of GST in Sec. 13.51 below. 

3.3 Gate set tomography 

We have seen that QST and QPT assume the initial states and final measurements are 
known. In fact, these states must be prepared using quantum gates which themselves 
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may be faulty: {{Ej\ = {{E\Fj, \pi)) = Fl\p)), where |/?)) are some particular 

starting state and measurement the experimenter is able to implement. This resnlts in a 
self-consistency problem. If the state preparation and measnrement (SPAM) gates {Ej}, 
{F'} are snfficiently faulty, the QST and QPT estimates will be fanlty as well. GST 
solves this problem by inclnding the SPAM gates self-consistently in the gate set to be 
estimated. 

The goal of GST is to completely characterize an unknown set of gates and states , 


g = {\p)),{{E\,Go,...,GK}, 


(3.7) 


where Ip)) € Tin is some initial state, ((S| € 77* is a 2-outcome POVM, and each 
Gk G EiTin) is an n-qubit quantnm operation in the Hilbert-Schmidt space B of linear 
operators on Tin- Q is called the gate set. By characterization we mean a procedure 
to estimate the complete process matrices or PTMs for Q based on experimental data. 
Sometimes we would like to focus only on the gates and not the states and measurements, 
and we write the gate set as Q = {Go,... ,Gk}- The distinction should be clear from 
the context. 

As in QPT, the information needed to reconstruct each gate Gk is contained in 
measurements of {{Ei\Gk\pj)), the gate of interest sandwiched between a complete set 
of states and POVMs. The experimental requirements for GST are therefore simi¬ 
lar to QPT: the ability to measnre expectation valnes of the form p = Tr{EG{p)} = 
{{ E\Rg\p)) for the gate set, Eq. (|3.7p . 

In QPT, the set {((Ej|, \pj ))} is assumed given, and this leads to incorrect estimates 
when there are systematic errors on the gates preparing the initial and final states. For 
self-consistency, we must treat these SPAM gates on the same footing as the original 
gates {Gk}. This is done by introducing the SPAM gate^ explicitly: \pj)) = Fj\p)) 
and {{Ei\ = {{E\Fi. The SPAM gates E = {Fi,..., F]\f} are composed of gates in 
the gate set G, and therefore the minimal G mnst inclnde snfficient gates to create a 
complete set of states and measnrements. In contrast to QPT, it is not possible in GST 
to characterize one gate at a time. Instead, GST estimates every gate in the gate set 
simnltaneonslyll 

Becanse the SPAM gates are inclnded in the gate set, GST reqnires that only a single 
initial state p be prepared and a single measurement E implemented. This is close to 
the experimental reality, where p is nsually the ground state of a qnbit or set of qnbits, 
and E is a measnrement in the Z-basis. 


3.4 Linear inversion GST 

We now derive a simple, closed-form algorithm for obtaining self-consistent gate esti¬ 
mates. This algorithm was introdnced by Robin Blnme-Kohout, et al. and was in- 

^Reference refers to these gates as fiducial gates, hence the letter F. We follow reference @| and 
refer to them as SPAM gates in order to make contact with the notion of SPAM introduced earlier. 

®More accurately, there is a minimal set of gates that must be estimated all at once, which includes 
the gates required for SPAM. Additional gates may be added one at a time. 
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spired by the Gram matrix methods of Cyril Stark 3^ . The limitation of linear-inversion 
GST (or LGST) is that it does not constrain the estimates to be physical. There is often 
an unphysical gate set that is a better fit to the data than any physical one. As a re¬ 
sult, LGST by itself is insufficient to provide gate quality metrics. Nevertheless, LGST 
provides a convenient method for diagnosing gate errors, and also gives a good starting 
point for the constrained maximum likelihood estimation (MLE) approaches we discuss 
later. Also, in cases where the LGST estimate happens to be physical, it is identical to 
the one found by MLE. 

We begin by identifying a set of SPAM gate strings T = {Ti,...,F^ 2 } that, when 
applied to our unknown fixed state \p)) and measurement ((E|, produce a complete set 
of initial states \pk)) = Fk\p)), and final states {{Ek\ = {{E\FkU In terms of the Pauli 
basis, Eq. (12.81) . the components of these states are 


\Pk)), = {{i\Fk\p))=TT{P,Ek{p)}, (3.8) 

{{EkU = {{E\Fk\i)) = Tv{Fl{E)Pi}. (3.9) 

The SPAM gates must be composed of members from our gate set, (Eor a 

single qubit, the completeness requirement means that {Fk\p))}'^^i must span the Bloch 
sphere, and similarly for {{{E\Fk}.) In general, SPAM gates have the form [^, 


Pk = o o , (3.10) 

where {fki} are indices labeling the gates of ^ = {Gq, ...,Gk}, and Lk is the length of 
the k-th SPAM gate string. 


3.4.1 Example gate sets 

It is useful to have concrete examples of gate sets. Here we consider two of the simplest, 
both for a single qubit. We will continue to use both examples below. (We use the 
notation Aq to denote a rotation by angle 9 about axis A, for example X ^/2 is a 7r/2 
rotation about the A-axis of the Bloch sphere.) 

Q = {{},X^i 2 ,Y^i 2 } = {Go,G'i,G 2 }. The symbol {} denotes the “null” gate - do 
nothing for no time. (We will always choose Gq to be the null gate for reasons that 
will become clear later.) One choice of SPAM gates is A = {{}-,X^i 2 ,Y^i 2 -,X^i 2 o 
X^/ 2 } = {Go, Gi, G 2 , Gi o Gi}. It is easy to see that the set of states Fk\p)) (mea¬ 
surements {{E\Fk) spans the Bloch sphere for any pure state Ip)) (measurement 

((^D- 

^Note that in defining {{Ek\ we use Fk and not fI as one might expect. This is important in the 
analysis that follows. 

®If Q is insufficient to produce a complete set of states and measurements in this way, we must add 
gates to Q until this is possible. In practical applications, one is interested in characterizing a complete 
set of gates, so this is not a problem. 
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2. G = = {Go,Gi,G 2 ,G 3 }. Here the SPAM gates are T = 

{{}, Ajr/ 2 ,H^/ 2 )-^tt} = G- This includes one more gate in the gate set compared 
to Example 1, but now the SPAM gates do not contain products of gates from the 
gate set. 

In Example 1, we chose = X ^/2 ° 12 rather than so that we would not need 
to add the additional gate to the gate set G- The choice F4 = Gi o Gi is more effi¬ 

cient from an experimental standpoint. This can be a consideration if the experimental 
resources required to implement GST for an additional gate (16 additional experiments) 
are significant. If this is not a consideration, it turns out to be a disadvantage for the 
purposes of data analysis to have multiples of the same gate within a single SPAM gate. 
This is because the order (the polynomial power in the optimization parameters) of the 
MLE objective function is proportional to the number of gates in the experiment. (This 
is not so much of an issue if an LGST analysis is sufficient.) Therefore, the choice of 
gate set depends on the constraints of the experimental implementation as well as the 
post-processing requirements. 

3.4.2 Gram matrix and A, B matrices 

In GST, we work with expectation values, 

Pijk = {{E\FiGkFj\p)), (3.11) 

where Fi,Fj G F and Gk G G- These quantities correspond to measurements that can 
(in principle) be carried out in the lab. Inserting a complete set of states on each side 
of Gk in Eq. (13.lip , we obtain 

Pijk = J]((E|T)|r)) ((r|Gfc|s)) ((s|F,-|p)) = Y,^ir{Gk)rsBsy (3.12) 

rs rs 

This defines a set of Pauli transfer matrices, as follows. {Gk)rs = ((^|G'fc|s)) is the rs- 
component of the PTM for the gate Gk- Air = (( E\Fi\r)) and Bgj = (( s\Fj\p)) are, re¬ 
spectively, the r-component of {{Ei\ and the s-component of \pj )) (See Eqs (|3.8I1 . ()3.9p i. 
It is useful to write A and B in component-free notation as 

A = Y.\i)){{E\Fi, 

i 

B = 

j 

One can easily verify that Air = (( E\Fi\r)) and Bgj = {{s\Fj\p)) as required. We then 
find that, according to Eq. (j3.12p . pijk = {AGkB)ij. 

Experimentally measuring the values pijk in Eq. (|3.12p amounts to measuring the 
{ij) components of the matrix 

Gk = AGkB. (3.15) 


(3.13) 

(3.14) 
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Since Gq = {} (the null gate), the k = 0 experiment gives 

g = Go = AB. (3.16) 

The matrix g is the Gram matrix of the {T)} in the Pauli basisj^ We observe: 

g-^Gk = B-^A-^AGkB = B-^GkB. (3.17) 

Thus, 

Gk = 9 ^Gk (3.18) 

is an estimate of the gate set, up to a similarity transformation by the unobservable (see 
next section) matrix B. 

It can happen that the SPAM gates that are implemented experimentally are not 
linearly independent within the bounds of sampling error. Then {|/9j))} (and {((Pljl}) 
will not form a basis, and g will have small or zero eigenvalues - it will not be invertible. 
If this happens, the experimentalist must adjust the SPAM gates until g is invertible. 
We will discuss this further in the next chapter. 

3.4.3 Gauge freedom 

We now show that the matrix B in Eq. (|3.17l) is unobservable, as mentioned above. Recall 
that we assume no knowledge of Q, which includes {{E\ and \p)). All experiments we 
consider are of the form {{E\Gi^Gi 2 ...Gi^\p)). (This includes measurements of pijk, 
since the T) are composed of elements of Q.) Transforming ((E| —)• {{E'\ = {{E\B, 
Ip)) —)• Ip')) = B~^\p)), Gk —)• G'^, = B~^GkB, we find for a general expectation value: 
(( Ip')) = ((£'|GqGj 2 |p)). Therefore, gates estimated by GST have 

a gauge freedom - similarity transformation by a matrix, B. Any two sets of gates related 
to each other by a gauge transformation will describe a given set of measurements equally 
well (given that the states and measurements are transformed in the same way). They 
will be the same distance away from the actual gates (according to any distance metric) 
and, if they are physical, will have the same fidelity relative to the actual gate set. 


Now consider the vectors 



\P)) = 

i 

(3.19) 

{{E\ ^ 

{{E\B = ^{{E\Ffp)){{j\. 
j 

(3.20) 


These vectors are componentwise identical, and the components are measurable quan¬ 
tities: ((i|p)) = {{E\i)) = {{E\Ei\p)). They provide a way to construct a gate set 
consistent with our measurements, up to the gauge freedom B. Let 

Ip)) ^ g-^\p))=B-^\p)), (3.21) 

{{E\ ^ {{E\ = {{E\B. (3.22) 

®The importance of the null gate is now clear. In order for Eq. (ITTell to hold, {} must be an exact 
identity. Anything else, e.g. an idle gate, wonld introduce an error term between A and B. 
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As we saw before, 


Gk^g-^Gk = B-^GkB. (3.23) 

The gate set ^ = {|/ 0 )),((.B|, {G^}} consists entirely of measurable quantities: {{E\FiGkFj\p)) 
for g and Gk, {{F\Fi\p)) for |p)) and {{F\. We can therefore construct it from experi¬ 
mental data. By gauge freedom, measurements in Q are equivalent to those in Q: 

{{E\llG,Jp)) = {{E\B (iIb-^G,,b\ B-^\p)) = {{E\llGM) . (3.24) 

k \ k / k 

Hence, Q is indistinguishable from the true gate set Q. 

However, since B is never the identity, Q is not equal to Q. Since B cannot be 
measured, the best we can do is hnd an estimate of B that brings us as close as possible 
to some “target” gate set. The target may be chosen arbitrarily in accordance with 
the gauge freedom. Choosing it to be the intended experimental gate set allows us to 
compare the actual gates to ideal ones. 

3.4.4 Gauge optimization 

As we just saw, the gauge matrix Bij = {{i\Fj\p)) = Tr{PjT)(/?)} cannot be the identity. 

(This would require Fj(p) = Pj^j, which is impossible for a physical state.) Therefore, 
the gate set Q estimated by LOST is necessarily different from the true gate set Q. Un¬ 
fortunately, experiments have no access to the gauge matrix B. But, since experiments 
look the same regardless of the gauge, we are free to choose a gauge that suits our pur¬ 
poses. This choice makes no practical difference - a quantum computation in any gauge 
is indistinguishable from the same computation in another gauge. 

Experimental qubits these days are quite good - randomized benchmarking gate 
fidelities in excess of 99% are routinely reported [^, In this case we know a priori that 
the measured gates will differ from an ideal (or target) set of gates by some very small 
error. A reasonable protocol then is to select the gauge such that the estimated gate set 
is as close as possible to this target gate set. The remaining differences are attributed 
to systematic gate error and sampling error. 

To define closeness of quantum gates, we choose a suitable matrix norm on superop¬ 
erators. For our purposes, the trace norm is sufficient 0]. Given a target set of gates 
T = {|r)), ((//|, {Tfc}} and our LGST estimate G = {|/i)), ((E|, {G^}}, we find the 
matrix B* that minimizes the RMS discrepancy (trace norm): 

B* = argmin Tr I (g^ - (g^ - B-^TkB^ I . (3.25) 

B k=l ^ ^ 

The index k runs over the entire gate set, and we also include the “gate” Gk+i = 

Ip)) (( E\. This allows the states to be fitted as well. 
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Our final LOST estimate is then 



= B*Gk{B*)-\ 

(3.26) 

Ip*)) 

= bTp)), 

(3.27) 

{{E*\ 


(3.28) 


This is the closest gate set to the target that is consistent with experiments as well as 
the allowable gauge freedom. 

3.4.5 Discussion 

A few points are worth mentioning. First, we note that det(i?) can be fixed by a 
requirement on the normalization of |/5*)). We do not do this, but rather let B vary 
over the entire group of real, invertible matrices, GL{(P,TZ), letting the optimizer find 
the (not necessarily physical) gate set consistent with the data. In addition, Eq. (I3.25p 
contains a nonlinear objective function [B and B~^ each appear twice), and therefore 
has multiple minima. As a result, the gauge optimization problem requires a starting 
point to be specified. This may lead to some unexpected consequences, which we now 
describe. 

Besides having multiple minima, the global minimum of the objective function, Eq. 
(|3.25p is not unique. This indeterminacy is generic in quantum tomography, and appears 
in QST and QPT as well. In particular, we cannot distinguish {{U^{E)\G\p)) from 
(( E\G\U{p) )) for any gate G, where U is an arbitrary operation that commutes with G. 
(Depolarizing noise, for example.) The two sets {(( [/^(E)], |/?))}, {(( E|, |f7(/9)))} are 
generally different. This indeterminacy can be recast as a type of gauge freedom, but in 
this case fixing the gauge provides no additional information about the true state and 
measurement. This has consequences for our analysis; an initialization error \£{p))) 
(e.g., a hot qubit) cannot be distinguished from a faulty measurement {{£{E)\ (e.g., 
dark counts). 

Typically, the starting point for numerical optimization of Eq. (13.2511 is taken to be 
the target gauge matrix, defined as E® = {{i\Sj\T)). {Sj is the target for Ej, composed 
of gates in T in the same way that Fj is composed of gates in Q.) This starting point 
depends on the target |r)) for |/?)) but does not depend on the target {{p\ for ((E|. 
As a result, gauge optimization will always produce |p*)) ~ I'r)), and will attribute 
any initialization error to error in ((E* |, regardless of whether the error was actually 
in ((E| or \p)). This must be kept in mind when interpreting the results of GST for 
estimating states and measurements. The gate estimates themselves are not affected by 
this particular gauge freedom. 

3.5 Maximum likelihood estimation 

Linear inversion typically does not produce estimates that are physical (it is not con¬ 
strained to do so). In contrast to QPT (see Sec. (j3.2ll . Eq. (I3.6IB . it is also incapable 
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of working with overcomplete data, which could be used to improve the estimate. MLE 
solves both of these problems. (Another approach is to find the closest physical gate set 
to the LGST estimate, but this is not optimal and also cannot be extended to over¬ 
complete data.) Since the objective function for MLE is nonlinear, the linear inversion 
estimate is still useful as a starting point for the optimization algorithmlj] 

Our goal is to estimate the true probabilities pijk in Eq. (13.lip based on a set of 
measurements rriijk, subject to physicality constraints. MLE is the natural way to do 
this: the best estimate is found by fitting to experimental data a theoretical model of 
the probability of obtaining that data. The resulting estimates pijk are used to find the 
most likely set of gates Q that produced the data. 

We begin by parameterizing the 4x4 estimate matrices G, as well as the state 
and measurement estimates p, E, in terms of a vector of parameters, t: G{i), \p{t))), 
E{t). (Although we have written the same vector t as the argument in each matrix, 
the different gates and states depend on non-overlapping subsets of parameters in t.) 
There are several possibilities for the parameterization, which we discuss in detail later. 
The important point for now is that the parameterization is either linear or quadratic 
in t: each matrix element of G{i), \p{i))), E{i) is a polynomial of order 1 or 2 in the 
parameters {tj}. There are dP = 16 parameters for each gate Gfc, k = 1,..., K {k = 0 is 
not parameterized), and d? = A parameters each for E and p. 

A number of constraints reduce the total number of independent parameters, as 
we discuss later. The minimal case of GST on one qubit requires a gate set Q with 
K = 2. This gives Kd'^ + 2d‘^ = 40 parameters. Equality constraints (trace preservation 
requirement) reduce the number of parameters by 4 per gate, and by 1 for p. Thus we 
are left with 31 free parameters in the minimal instance of GST. 

Putting everything together, the estimates pijk are written in terms of the parameter 
vector t, as 

Pijkif) = {{E{^)\Fi{t)Gk{^F,{^\p{^)) . (3.29) 


Depending on the parameterization choice (discussed below), each of the gates and states 
in Eq. (|3.29p is either a linear or quadratic function of its parameters. The estimator pijk 
is therefore a homogeneous function of order 5 (linear parameterization) or 10 (quadratic 
parameterization). 

MLE proceeds by finding the set of parameters t that minimizes an objective function. 
The objective, or likelihood function, is the probability distribution we assume produced 
the data. The most general likelihood function for the experiment we have described 
above is 

L{G) = (3.30) 

ijk 

Although several workers [^, have successfully used this likelihood function with their 
experimental data, it is usually more convenient to use a simpler form. To this end, we 

^Standardized optimization methods exist for convex (linear, quadratic) objective functions, which 
have a single minimum [^. When the objective is nonlinear, with many local minima, there are 
no existing techniques that are guaranteed to find the global minimum. We must therefore use local 
optimization techniques and rely on a good starting point to put us close to the global minimum. 
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invoke the central limit theorem to rewrite the likelihood as a normal distribution, 


L{Q)= 

ijk 


{l^ijk Pijk) l^ijk 


(3.31) 


where ci^ = p{l — p)/n is the sampling variance in the measurement m|l 

Because the logarithm function is monotonic, maximizing the likelihood L is equiv¬ 
alent to minimizing the negative log-likelihood I = — logL. With a normal likelihood 
function, the problem reduces to weighted least-squares: 


Minimize : 1{Q) = '^ [rriijk - Pijk{^)Y (3.32) 

ijk 


Or, rewriting the estimators pijk in terms of the gate estimates, Eq. (I3.29p . the problem 
we need to solve is minimization over the parameters t of 

m = Y. - {{m\U^Gk{t)F,{t)m))f l^lk- (3.33) 

ijk 

As we saw above, the estimator pijk is of order 5 or 10 in the parameters. Therefore this 
objective function is 10-th to 20-th order in the parameters, making this a non-convex 
optimization problem. 

It remains to express pijk in terms of the parameters t, so that we have an explicit 
form for 1{Q) in Eq. (I3.33|) in terms of t. There are two parameterizations for gates that 
are commonly used: (1) The Pauli Process (x) Matrix representation and (2) the Pauli 
Transfer Matrix (PTM) representation. Each has its advantages, so we present both. 


3.5.1 Process matrix optimization problem 

The process matrix xg for a gate G is defined in terms of the gate’s action on an arbitrary 
state p to produce a new state G{p) as in Section 12.1.21 

G{p) = Yh (^G)ijPipPj- (3.34) 

ki=i 

The gate action is expressed in terms of Pauli operators (see Section 12.ip Pi, Pj acting 
on the state. 

The X matrix must be Hermitian positive semidefinite. This requirement follows from 
the Hermiticity of density matrices, p^ = p, G{p)^ = G{p), and from the requirement of 
positive probabilities: pi = Tr{|i)(i|/9} > 0 for any pure state |i) and any density matrix 
p (see Sec. (I2.3p i. Any Hermitian positive semidefinite matrix can be written in terms of 

®When performing numerical optimization, we typically approximate p « m rather than p ~ p in the 
sampling variance, writing P = m(l — m)ln. This keeps the objective function from blowing up. 
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a Cholesky decomposition, y = T^T, where T is a lower-diagonal complex matrix with 
reals on the diagonal: 


/ tl 
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tb + its 

t2 
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ill + iti2 

tj + its 

h 
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\ ti5 + itiQ 

tl3 + itlA 

tg + itio 
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In addition, there will be a constraint on x due to the requirement of trace preserva¬ 
tion of the state after the application of the gate: Tr{G(p)} = Tr{p}. This leads to 
the completeness condition, '^ijXijPjPi = d, which is equivalent to the following four 
equations. 

^XijTr{PiPfcPj} = 4o, A: = l,...,4. (3.36) 

b 

This constraint reduces the number of free parameters for x from 16 to 12. In practice, 
we leave the 16 parameters and introduce the constraint as an equality constraint in the 
numerical optimizer. 

We now discuss parameterization of E and p. Both are Hermitian positive semidef- 
inite, so we may parameterize them via Cholesky decomposition in the same way as 
X- (Note only that they are 2x2 rather than 4x4 matrices, so that each will have 
4 parameters.) The state p has unit trace, and the measurement E must be such that 
I — E is also positive semidefinite. These conditions introduce additional constraints for 
optimization. 

We can now write down an explicit form for the weighted least-squares objective 
function, Eq. ()3.33H . in terms of the parameter vector, t. We expand the estimator pijk 
in terms of the x matrices for its constituent gates using Eq. (USD, 

Pijk = {{E{i)\M^Gk{i)Fj{P)\p{t))) 

= Tv{EF,{Gk{F,{pm 

= ^ {XFi)tuiXGk)rsiXFj)mnT^{EPtPrPmpPnPsPu}- (3.37) 

mnrstu 

Remember that each x matrix as well as p and E is written as T^T in terms of its own 
set of parameters (the xs are 4x4 and p and E are 2x2), and therefore pijk is generally a 
homogeneous function of order 10 in these parameters (it consists of 3 x-matrices plus p 
and E). Also, it may be the case that some of the E), Fj are composed of more than one 
gate G & Q. In this case XFi and XFj can be decomposed further into process matrices for 
the gates in Fi,Fj, with corresponding additional Pauli matrix terms appearing inside 
the trace. This makes pijk a correspondingly higher order polynomial. Eor this reason 
we would like to avoid defining SPAM gates as combinations of gates in the gate set. 
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Problem statement 

We can now state the optimization problem as follows. 


Minimize : '^ijk {XFi)tu{XGk)rs{XFj)mn'^T^{EPtPrPmPPnPsPu] \ /(^ijk 

ijk \ mnrstu / 

Subject to : Y^{xg) mn Th:{PmPrPn} - (5or = 0, G eg 

mn 

Tr{p} = 1, 

I - E )^0 


The wavy inequality denotes positive semidefiniteness. For each G e Q there is a 
4x4 matrix XG = TqTg, where Tq is a lower diagonal complex matrix with real entries 
on the diagonal, Eq. (13.351) . There are 16 free parameters for each G e g, and each 
matrix element of XG is a 2 nd order polynomial in the {ti}. 

The matrices xf must be expressed in terms of matrices XG- In the second example 
in Sec. 13.4.11 above, this is trivial since P = g. In the first example the only non-trivial 
case is F 4 = Gi o Gi. The simplest way to find XF 4 is to calculate the PTM for F 4 by 
multiplying the PTMs for Gi, and then transforming back to the x representation. This 
can be implemented with a simple numerical routine. 

E and p are parameterized similarly to x ns T^T, only they are 2x2 matrices. Each 
therefore contains 4 parameters. 

Possible starting points are the target gate set or the LGST estimate (rather, the 
closest physical gate set to the LGST estimate). Typically LGST provides a better 
starting point than the target gate set. 

3.5.2 Pauli transfer matrix optimization problem 

Next we discuss parameterization in terms of Pauli Transfer Matrices (PTMs). As we 
saw in Ch. [2l the composition of gates is represented as matrix multiplication of PTMs. 
This avoids the cumbersome trace terms appearing in equations such as Eq. (|3.37p . In 
addition, the PTM for each gate may be parameterized linearly rather than quadratically 
as we did for x- This reduces the order of the objective function by a factor of 2. The 
drawback is that the positivity constraint - corresponding to positive-definiteness of x 
that we imposed above via Gholesky decomposition - has a more complicated structure 
in terms of PTMs. It is expressed by the requirement of positive semidefiniteness of the 
so-called Ghoi-Jamiolkowski (GJ) matrix representing the quantum map, see Sec. 12.3.21 
Imposing this type of constraint without Gholesky decomposition is difficult to do with 
standard nonlinear optimization techniques, but can be done using semidefinite pro¬ 
gramming (SDP) [ 2 ^ if the objective function can be made quadratic. 

The Pauli Transfer Matrix Rg for a gate G is defined in terms of the gate’s action 
on Pauli matrices (see Section I2.I.3P , 

{RGh = TT{PiG{P,)}. (3.38) 
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The PTM contains the same information about the map as does x- It is a x 
real-valued matrix with elements restricted to the interval Rij G [—1,1], and is generally 
not symmetric. Like y, the PTM has 16 parameters. The trace-preserving condition is 
Roj = 5oj. These 4 equations reduce the number of parameters to 12. 

In the PTM representation, density matrices p in Hilbert space are written as vectors 
in Hilbert-Schmidt space and denoted as |/o)). The matrix elements (entries) of a general 
vector I/?)) are defined as 

Pi = {{Ap)) = Tr{Pip}. (3.39) 

The left-hand term in the equality is the f-th component of the 4x1 vector l/o)). The 
middle term is this same component written in Dirac notation. In the right-hand term, 
p is the density matrix in standard 2x2 representation, and Pi is a Pauli matrix. 

Using these definitions, the state G{p) after application of the gate G can be written 
as a vector \G{p))) in Hilbert-Schmidt space resulting from matrix multiplication by 

Rg- 

\G{p))) = Rg\p)) (3.40) 

The standard density matrix form of G{p) can be recovered from the vector |G(/9))) 
using the definition of matrix elements, Eq. (j3.39D . 

Tv{RG{p)} = {{i\G{p))), (3.41) 

and the expansion of any density matrix in terms of Pauli matrices, 

4 

G{p) = J2T^r{RG{p)}R. (3.42) 

i=l 

In Eq. ()3.33|1 . the weighted least-squares objective function has already been written 
in terms of PTMs. This was implicit in the superoperator notation we used to derive 
that equation. In terms of the ii-matrix notation for PTMs, we rewrite Eq. (|3.33l) as 

m = Y. - ((E(t)|i?^,(t)i?G,(i)i?F,(i)|/5(t))))'/4fc- (3-43) 

ijk 

We can parameterize each matrix R linearly in terms of t. This means each Rij = tg for 
some index s. The term in double-brackets, ((...)), in Eq. ()3.43l) is a scalar given by 
applying the indicated matrix multiplications of i?-matrices to the vector |/5)), and then 
scalar multiplying by {{E\. 

The state \p)) and measurement {{E\ should also be parameterized linearly, and 
the constraints (positive semidehniteness, hermiticity, trace-preservation of p) can be 
imposed during optimization. 

Since each R is linear in the parameters, t, the matrix product in Eq. (|3.43|) is 5th 
order in the parameters, t. Therefore the objective function, Eq. (I3.43p is a lOth-order 
polynomial in t. 

As mentioned earlier, the positivity constraint is expressed as the positive semidefi¬ 
niteness of the Choi-Jamiolkowski matrix, Eq. (12.161) . 
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Problem statement 


(ruijk - {{ E{t)\RFi{i)RGki^RFj{^\K^ ))) /(rfjk 

ijk 

pg = ^Y^RohPjyc^G 

*j=i 

{RG)oi = ^oi y G e G 

{RG),jG[-l,l] yG,i,j 
TV{p} = 1, 

I - E 

For each G £ G the 4x4 matrix Rq is parameterized linearly, {RG)ij = tg for some index 
s. |p))isa4xl vector and ((Fl| is a 1 x 4 vector, p and E are parameterized linearly 
in such a way that they are hermitian and positive semidefinite. The components pi of 
Ip)) are given as pi = Tr{Pjp} and similarly for ((Fl|. 

The product in double brackets in the objective function is a scalar calculated by 
matrix multiplication of the Pauli transfer matrices for the gates as indicated. For the 
first example in Sec. I3.4.T1 the matrix Rp^ = {RgiY- Note that this choice increases the 
order of the objective function in the parameters, t, since Gi appears twice in the same 
F-gate. As a result, the 2nd example gate set in Sec. I3.4.1l is a better choice. 

This version of the optimization problem has the advantage over the x-matrix ver¬ 
sion in that the objective function is a lOth-order rather than a 20th-order polynomial 
in t. This is still a highly nonlinear objective function. The tradeoff is the requirement 
of positive-semidefiniteness of the matrix pQ for each G, which is not necessary in the x 
matrix approach. This requirement naturally suggests a solution in terms of a semidef¬ 
inite program (SDP). In Ref. Q], this is achieved by replacing the objective function 
by a quadratic approximation in order to recast the problem as a convex optimization 
problem, which is then solved by SDP. (Recall that convex optimization requires the 
objective function to be of order 2 or less in the optimization parameters.) This approx¬ 
imation amounts to linearizing the triple-gate-product in the estimator pijk about the 
target gate set, and eliminating the t-dependence of p and E. The latter can be done 
either assuming the state and POVM are perfect, or by introducing imperfect ones by 
hand or from the LGST estimate. 

Finally, we note that a different gate set, such as the linear inversion estimate, may 
be used as the starting point for nonlinear optimization, or as the point about which the 
gate set is linearized for convex optimization. 


Minimize : 


Subject to : 
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Chapter 4 

Implementing GST 


The data-processing challenge of GST boils down to that of solving a nonlinear op¬ 
timization problem. We are presented with measurements rrii of a set of probabilities 
Pi = (( E\Gi{p) )), i = 0, K. (Using a simplihed notation with a single index i.) Based 
on these measurements alone, we would like to find the gate set Q = {|/o)), (({Gj}} 
that produced the data. Practically speaking, we would like to hnd the best estimate pi 
of the true probabilites pi, given the experimentally measured values rrii, where the pi 
are functions of the gate set (i.e., of the parameters used to dehne the gate set). This 
estimate should correspond to a physical (CPTP - see Sec. 12.311 set of gates. 

Typically, each of the K experiments is repeated a sufficiently large number of times 
N that the central limit theorem applies. Therefore rrii can be considered a sample 
from a Gaussian distribution^ Then, as we have seen, the problem of estimating pi 
can be rephrased as maximum-likelihood estimation (MLE) with an objective function 
(log-likelihood) that is quadratic in the pi. Furthermore, if Gi is a single gate (this is 
the case in QPT), the objective function is quadratic in the gate parameters, and thus 
convex. (A convex function has only a single local minimum, the global minimum, in 
its domain of dehnition.) In this case, the problem can be solved using standard convex 
optimization techniques j^, . 

Computation of the QPT estimate is therefore a solved problem (aside from some 


technicalities, see Ref. 37|). For a linearly-parameterized gate set, it is straightforward 


(though possibly computationally intensive) to determine the most likely CPTP gates 
that produced the data, as well as the errors in the estimate. Unfortunately, QPT does 
not solve the correct problem. For self-consistency, we require state preparation and 
measurement (SPAM) gates to be included in the {Gi}, making the gate set at least 
3rd order in the gate parameters. As a result the objective function is no longer convex 
(it has many local minima) and the problem is no longer solvable by standard convex 


^More accurately, rrii is a sample from a binomial distribution, which can be approximated as Gaussian 
when piN and (1 — ppN are not too small. In a hypothetical ideal experiment with no intrinsic SPAM 
errors, it can happen that pi « 0 or 1 for some values of i when the gate error is low. The Gaussian 
approximation then breaks down and we must use the full binomial objective function. Since real 
experiments are not perfect (noise affects initialization, readout, etc.) this should not be an issue. 
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optimization techniques. Instead one has to choose a combination of approximate and 
iterative methods that one hopes can locate the global optimum. 

Our goal in this chapter is to illustrate the effectiveness of GST. We use the sim¬ 
plest optimization methods that get the job done. Thus, we implement full nonlinear 
optimization in Matlab, choosing reasonable settings for the optimization routines but 
making no effort to optimize these settings. We find this is sufficient to illustrate the 
main features of GST. 

We begin by summarizing the steps required to implement GST, including how to 
gather and organize the data, and some tips on the analysis. We then present the results 
of GST for a single qubit using simulated data with simplified but realistic errors. We 
compare the performance of maximum likelihood GST (ML-GST) to that of QPT for 
varying levels of coherent error, incoherent error, and sampling noise. We corroborate the 
result of Ref. that coherent errors are poorly estimated by QPT near QEC thresholds 
while GST is accurate in this regime. 

In Ghapter[3l we described two versions of ML-GST, a nonlinear optimization prob¬ 
lem based on the process matrix (Sec. 13.5.11) and a semidefinite program (SDP) based 
on the Pauli transfer matrix (Sec. I3.5.2|) . The numerical results in the present chapter 
were obtained using the process matrix - based approach. Due to the high nonlinearity 
of the objective function in this approach, the LGST estimate was essential as a starting 
point for MLE and provided better estimates than using the target gate set as a starting 
point. 

4.1 Experimental protocol 

Recalling the definitions from the last chapter, Q = {G'o,Gi,... is the gate set, 

with Go denoting the “null” gate - do nothing for no time - a perfect identity. J- = 
{Fi,..., Fat} is the SPAM gate set. It is used to construct a complete basis of starting 
states and measurements from a given particular starting state p and measurement 
operator E. Each gate F, G F is composed of gates in Q. For a single qubit, d = 2, we 
must have > 4. For simplicity, we can take N = A. 

The experimental protocol is as follows. 

1. Initialize the qubit to a particular state \p)). In most systems, the natural choice 
for p is the ground state of the qubit, p = |0)(0|. 

2. For a particular choice of i,j G {1,..., AI}, fc G {0,..., FT}, apply the gate sequence 
Fj o Gfc o Fj to the qubit. Remember that the F gates are composed of Gs. So the 
gate sequence applied in this step is a sequence of gates G G G. 

3. Measure the POVM E. E is required to be a positive semidefinite Hermitian 
operator, such that I — E is also positive semidefinite. (/ is the identity.) As is 
the case for p, the natural choice for E in most systems is F = |0)(0|. Sometimes 
E = |1)(1| is used. 
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4. Repeat steps 1-3 a large number of times, n. Typically, n =1,000 - 10,000. For 
the r-th repetition, record = 1 if the measurement is success (i.e., the measured 
state is |0)(0|), and n,. = 0 if the measurement is failure (i.e., the measured state 
is not |0)(0|). 

5. Average the results of step 4. The result, rriijk = nrjn, is a measurement of 
the expectation value pijk = {{E\FiGkFj\p)). It is a random variable with mean 
Pijk and variance Pijfc(l -Pijk)/n. 

6. Repeat steps 1-5 for all i,j G {1,.. . ,N}, /c G {0,... ,K}. 

7. Optional ~ Repeat steps 1-5 to measure the expectation values pi = {{F\Fi\p)). 
Since typically Fq = Go = {}, this data will already be contained in the measure¬ 
ments of Pijk- However, it helps to have an independent measurement if possible. 

4.2 Organizing and verifying the data 

The procedure above gives measurements of the following quantities. 

{{F\F,oGkoFj\p)), 

{{F\F,oFj\p)), 

amip)), 


for k = 1,... ,K and i,j = 1,... , A". 

The experimental data should then be organized into matrices, as follows (see Eqs. ()3.12l) . 

(imi) . (TOD - drm i. 


(Gfc)jj 

= {{F\FiGkF,\p)), 

(4.1) 

9ij 

= {{F\FiF,\p)), 

(4.2) 

Ip)). 

= ((E|F,|p)) = ((E|„ 

(4.3) 


where in an abuse of notation we have written {{F\. .. |p)) to stand for the measured 
values of these quantities rather than the true values. 

Once data is obtained, and before proceeding further, we must check that the Gram 
matrix, pij = (( F\FiFj\p)) , is nonsingular, so that it may be inverted to find the estimate 
as in Eqs. (|3.21l) - (|3.23l) . We want the smallest magnitude eigenvalue to be as large as 
possible. This is because the sampling error on the estimate scales roughly as the inverse 
of the smallest eigenvalue of the Gram matrix (multiplied by the sampling error in the 
data). A good rule of thumb is that no eigenvalue be less than 0.1 in absolute value. 
(Then for N = 2000 samples the sampling error in the estimate is bounded at 5%.) 

If the eigenvalues of the Gram matrix are very small, this indicates that the SPAM 
gates are only marginally linearly independent. Viewed as vectors, they are highly 
overlapping. In this case, the experimenter must go back and tweak the knobs of the 
experiment to make the SPAM gates more orthogonal. Then the Gram matrix must be 
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measured again and its ^envalues checked. This process is repeated until a suitable 
Gram matrix is obtainedo 

4.3 Linear inversion 

Once we have checked that the experimental Gram matrix is invertible, we apply its 
inverse to the data matrices in Eqs. (|4.1D - (l4.8p . following the procedure in Section [TH 
(Note that g~^ is applied to all data matrices except the vector ((E|.) We obtain the 
following estimates for the gates and states. 


Ip )) 

= 9 ^\p)), 

(4.4) 

((^1 

= m, 

(4.5) 

Gk 

= g-^Gk. 

(4.6) 


Since g = Gq, the LGST estimate of the null gate is exactly the identity, as it should be. 

The gate set estimated in this way, Q = {|/5)), ((E|, {Gfc}}, is in a different gauge 
from the actual gate set (Sec. 13.4.21 - I3.4.4p . To compensate, we transform to a more 
useful gauge. Since we do not know the actual gate set, only the intended (target) one, 
the most useful gauge is the one that brings the estimated gate set as close as possible 
(based on some distance metric) to the target. The difference between the hnal estimate 
and the target gate set is then a measure of the error in the actual gate set (how far 
away it is from what was intended). 

The gauge transformation is found by solving the optimization problem dehned in 
Eg. 13.251 The resulting gauge matrix, B*, is then applied to Eqs. (|4.4I) - (14.6p . The hnal 
LGST estimate is given by Eqs. (|3.26p - (13.2811 . which we reproduce here: 



= B*Gk{B*)-\ 

(4.7) 

Ip*)) 

= B*\p)), 

(4.8) 

{{E*\ 

= 

(4.9) 


The linear inversion protocol is fairly easy to implement numerically and is useful for 
providing quick diagnostics without resorting to more computationally intensive estima¬ 
tion via constrained optimization. Since the LGST estimate is not generally physicaljl 
the information obtained is somewhat qualitative. Nevertheless, large enough errors can 
be easily detected with this approach. 

More importantly, LGST is useful as a starting point for MLE. Since the starting 
point must be physical, while LGST is not, we should use the closest physical gate set 
to the LGST estimate. Such a gate set can be found in a similar way to the gauge 
optimization procedure described above, choosing a metric such as the trace norm as a 
measure of distance between gates. 

^Typically experimentalists have other means at their disposal to ensure orthogonal, or nearly or¬ 
thogonal, gate rotation axes. Thus it is not difficult to obtain an invertible Gram matrix in practice. 

®There is no natural way to put physicality constraints into the LGST protocol. One option is to find 
the closest physical gate set to the LGST estimate, but this is suboptimal to MLE. 
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4.4 Estimation results and analysis for simulated experimental data 


In this section we present results of maximum likelihood - based GST (ML-GST) for 
several examples using simulated data, and compare to ML-QPT. The simulated errors 
are of three different types, representing coherent and incoherent gate errors as well as 
intrinsic SPAM errors. The MLE approach was discussed in Section 13.51 

Maximum likelihood estimation provides a convenient way to handle physicality con¬ 
straints as well as overcomplete data (we will get to that later). Although it has been 
argued that MLE is subqptimal to Bayesian methods 37| , it is still the method of choice 
for GST and QPT [^, [^, [ 2 ^, [^ . We shall see that MLE is highly accurate. 

As a model system, we consider a single qubit with available gates consisting of 
rotations about two orthogonal axes, which we label X and Y. This is the case in many 
existing qubit implementations [22 


Furthermore, we assume the qubit can be 
prepared in its ground state, p = |0)(0| and measured in the Z-basis, distinguishing 
E = |1)(1| and I — E = |0)(0|. This state and measurement may be faulty, and we will 
study the effect on estimation due to these errors. 

The results in this section were obtained using gate set 2 in Sec. 13.4.11 


e (4-10) 

F = g. (4.11) 

To illustrate the effect of systematic errors (all errors with the exclusion of statistical 
sampling noise are systematic errors), we use depolarizing noise as an example of inco¬ 
herent environmental noise, and over-rotations in the Y-gate as an example of a coherent 
control error. Depolarizing noise is defined by the map 


Gdep{p) = (1 - 3p)p + p{XpX + YpY + ZpZ) (4.12) 


For a 50 ns gate, a depolarizing parameter p = 0.005 corresponds to a decoherence 
time of 2.5 ps. In our numerical experiments we apply Gdep after every gate where 
depolarizing noise is required. 

Over-rotation errors are obtained by applying the map. 


Grotip) = exp 


-i-n ■ a I pexp ( t-n ■ cr j , 


(4.13) 


after every gate that should have the error, in our case the Y-gate. The parameter e is 
the angle of over-rotation, n = y is the rotation axis, and a = {X, Y, Z) is a vector of 
Pauli matrices. 

In addition to these systematic errors, there will be noise due to finite sampling 
statistics. As an example, N = 2000 samples per experiment produces a sampling error 
in the data that is upper-bounded by 1/{A^/N) = 0.005. 


4.4.1 Systematic errors 

We first consider systematic errors only, no sampling error. This is useful for examining 
how well GST can do relative to QPT in principle. In practice, sampling error will 
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contaminate both GST and QPT estimates, and efforts must be taken to reduce it. This 
can be done either by taking more samples (which can be impractical) or performing 
more independent experiments. 

We calculate estimates using the process matrix formulation of the MLE problem, 
see Sec. 13.5.11 This is a nonlinear optimization problem requiring a starting point to be 
specified. For the starting point, we use the LGST estimate for GST and the target gate 
set for QPT. More precisely, for GST we find the closest physical gate set to the LGST 
estimate (based on the trace norm) and use that for a starting point. 

Since the number of samples is taken as infinite (corresponding to zero sampling 
error), we use a standard (non-weighted) least squares objective function, 

^exp 

1{Q) = '^ {mi - pif , (4.14) 

i=l 

where is the number of experiments (16 for QPT, 84 for GST) and m, are the 

measured values of the true probabilities p*, of which pi are our estimates. Since the 
sampling error is zero, = Pi- 

The estimates in this section were generated using Matlab’s built-in optimization 
function fmincon, running the active-set optimization algorithm. Optimization time 
varied between 2-4 minutes for each GST estimate on an Intel(R) Gore(TM) i5-2500K 
(3.3GHz) processor. Optimizer settings were options.TolFun = 10“^^, options.TolGon = 
10“®, options.MaxFunFvals = 15000. Typically the objective function and constraint tol¬ 
erances were satisfied before the maximum number of function evaluations was reached. 

Example 1: Over-rotation (coherent) error 

Fig. 14.11 shows the estimation error of QPT and GST as a function of gate error for an 
over-rotation in the Y-gate, which is a type of coherent error. Gate error is defined as 
Sgate = 1 ~ F(actual, ideal), where .F(actual, ideal) is the average fidelity of the actual 
gate relative to the ideal gate0 Estimation error is defined similarly, as Sest = 1 ~ 
^(estimated, actual). We are able to use fidelity as a metric because the estimates are 
constrained to be physical. (For unphysical gates, 0 < F < 1 may not hold.) 

We make the following observations: (1) QPT attributes error to all gates, even 
though only single gate ( 1 ^/ 2 ) is actually faulty. This is because the faulty gate is used 
in SPAM for all gates. (2) The GST estimation error is flat as a function of gate error, 
and represents the threshold for the optimizer]^ (3) In the regime of gate error relevant 

^Average fidelity is defined as F{A,B) = f dpTi{pA~^ o B{p)}, where A and B are quantum 
operations, and the integral is over the uniform (Haar) measure on the space of density matrices. 
The average fidelity may be expressed in terms of Pauli transfer matrices Ra,Rb as F{A,B) = 
-I- d)/[d{d + 1)], where d is the dimension of the Hilbert space. See Ref. [s^. 

®We use a tolerance of e = 10“^^ on the (unweighted) least-squares objective function (see text), 
which corresponds to an estimation error of approximately 10“^. This can be derived by assuming an 
error in each entry of the PTM equal to the rms value of the residual in the objective function, ejNexp, 
where Ne^p « 10^ is the number of experiments. The estimation error is approximately equal to the 
error in the PTM elements, as can be verified using the trace formula in footnote [H above. 
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E(estim ated, actu al) E (esti mated, actual) 






Figure 4.1: Estimation error vs gate error for an over-rotation in 1^/2- Blue dots are 
ML-GST and green dots are ML-QPT. The gate error is related to the over-rotation 
angle approximately as Ey = (50y)^/6. The range of angles corresponding to the given 
range (10“® — 10“^) of gate error is 0.4° — 44.4°. The vertical dashed line indicates the 
selected value of over rotation error (4°, E = 8.5 x 10“^) for which the PTMs are plotted 
on the following pages. 
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actual 
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QPT 



Figure 4.2: Pauli transfer matrices for GST and QPT maximum likelihood estimates (2nd 
and 3rd columns). The actual gate set (1st column) contains an over-rotation error of 4° 
in the IQ /2 gate, corresponding to a gate error (inhdelity) of Fl(actual, ideal) = 8.5 x 10“^. 
GST correctly identihes the gate containing the error, while QPT attributes an over¬ 
rotation error to all gates. 
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Figure 4.3: Pauli transfer matrices for GST and QPT maximum likelihood estimates 
(2nd and 3rd columns) for the same data as in Fig. 14.21 with actual PTMs (column 1) 
subtracted off. The GST errors are about 1% of QPT errors, and are not visible on this 
scale. 
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to QEC thresholds {£ = 10“^ — 10“^), (and given the numerical tolerances) GST is 
several orders of magnitude more accurate than QPT. 

Fig. 14.21 shows the PTM coefficients for the gate set at a representative gate error 
(inhdelity) of 8.5 x 10““^. Results are shown for the actual gates and for the GST and 
QPT estimates. Both GST and QPT are good at catching the error on the faulty (IQ/ 2 ) 
gate, but QPT also attributes rotations to the other gates. The relative errors can also 
be seen in Fig. 14.31 which shows the difference between the estimated and actual PTMs. 

Example 2: Depolarization (incoherent) error 

The depolarization map given by Fq. (I4.12p is often used as a model of incoherent 
environmental noise. The depolarizing parameter p is related to the gate error as f = 2p^ 
The estimation errors of QPT and GST as a function of gate error for depolarizing 
noise are plotted in Fig. 14.41 As a measure of estimation error, we use the spectral 
norm (as implemented in Matlab with the function norm.m) of the difference between 
the estimated and actual gates. We use spectral norm rather than inhdelity as in the 
previous example because the optimization routine was unable to satisfy the tolerance 
on the physicality constraints in the present case0 Although the constraint violation 
was small (< 10“^ deviation in each element the top row of the PTM), it was enough to 
invalidate hdelity as a metric. 

As for the previous (over-rotation) example, we plot the PTMs for the estimated 
gates as well as the differences between estimated and actual PTMs for a representative 
gate error of 8.5 x 10“^, the same as in the over-rotation example. At a given level 
of systematic noise, the estimation error of QPT is smaller for incoherent errors than 
coherent errors. 

According to Fig. 14.41 the estimation errors of QPT and GST are about the sameU 
However, Fig. l4.6l shows that GST does a better job of estimating the non-zero coefficients 
of the PTM. These coefficients are 1—p, where p is the depolarizing parameter. Therefore 
GST gives a better estimate of p than does QPT. This is one illustration of the danger 
in relying on a single parameter as a metric of gate or estimation quality. 

Example 3: Intrinsic SPAM error 

As an example of intrinsic SPAM error, we assume a faulty initial state p, produced by 
applying depolarizing noise of strength p, Eq. (|4.12jl . to an ideal initial state. {{E\p)) = 0 
in the ideal case, and {{E\p)) = 2p is a measure of the state error. This is the same 
value as would be obtained for the gate error had the depolarizing noise been applied 

®This can be derived using the formula for average fidelity in footnote 2] 

^Technically, the diamond norm is a more correct metric for gate distance than the spectral 
norm. However, the spectral norm is easier to calculate. Since we are only interested in comparing the 
relative distance of QPT and GST estimates from the actual gate, and our simulated gates are exactly 
constrained to the single-qubit Hilbert space, the spectral norm is sufficient. 

®Based on the previous analysis of coherent errors, we expect the GST estimate not to vary as a 
function of the depolarizing parameter, since depolarizing noise is treated self-consistently in the same 
way. We expect this to be achievable using a more robust optimization routine than the one we use. 
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Figure 4.4: Estimation error vs gate error for depolarizing noise on all gates. Blue dots 
are ML-GST and green dots are ML-QPT. Estimation error is given by the spectral 
norm of the difference between the estimated and actual PTMs and gate error is the 
infidelity between the actual and ideal PTMs. The gate error is proportional to the 
depolarizing parameter. The vertical dashed line indicates the selected value of gate 
error {E = 8.5 x 10“^) for which the PTMs are plotted on the following pages. This is 
the same error magnitude that was selected for the over-rotation example above. GST 
estimation error is about the same as QPT. See text for discussion. 
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Figure 4.5: Pauli transfer matrices for GST and QPT maximum likelihood estimates 
(2nd and 3rd columns). The actual gate set (1st column) contains a depolarizing gate 
error (inhdelity) of Fl(actual, ideal) = 8.5 x 10“^, the same magnitude of gate error 
as in Fig. 14.21 QPT estimation error for depolarizing noise is smaller than that for 
over-rotation error and is not visible in this plot. 
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Figure 4.6: Pauli transfer matrices for GST and QPT maximum likelihood estimates 
(2nd and 3rd columns) for the same data as in Fig. 14.51 with actual PTMs (column 
1) subtracted off. The actual gate set (1st column) contains a depolarizing gate error 
(inhdelity) of Fl(actual, ideal) = 8.5 x 10“^. The QPT estimation error is about an 
order of magnitude smaller overall for depolarizing noise than for over-rotation error, 
see Fig. 14.31 The GST and QPT errors are comparable in magnitude, but QPT gives a 
worse estimate of the depolarizing parameter, which corresponds to the non-zero entries 
in the actual PTM. 
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to the gate instead of the state. Although depolarizing noise commutes with all gates 
(the i,j = 2 .. .d? sub-matrix is proportional to the identity), the example considered 
here is not equivalent to that in the last section. This is because in the present case 
there is a single depolarizing operation acting in each experiment, whereas the number of 
depolarizing operations in the last section varied between 1 and 3, depending on which 
opperators appeared in {{E\FiGkFj\p)). 

As could be expected from the results on depolarizing noise in the last section, the 
error in the QPT estimate grows linearly with the error on p. In fact, the QPT estimation 
error is almost exactly equal to the initial state error, showing that QPT attributes the 
noise to the gates rather than the state. This makes sense, since QPT assumes ideal 
initial states, and is confirmed by looking at the difference between the estimated and 
actual PTMs as in Fig. 14.91 In contrast, GST is insensitive to the initial state error, and 
the GST estimation error is at the optimizer threshold (see footnote B- 

4.4.2 Sampling noise 

In this section we repeat the first example from the last section, over-rotations in the 
Y-gate, but this time in the presence of sampling noise and intrinsic SPAM error. 
This situation is closer to real life, and illustrates what happens in general. We take 
^Samples = 10000, {{F\p)) = 0.01, both of which could reasonably occur in an experi¬ 
ment. This value of Nsampies corresponds to a sampling error of about 0.01 per PTM 
entry, so both errors are about the same order of magnitude. We expect that, as we 
vary the strength of the systematic error, QPT will be swamped by both sampling and 
intrinsic SPAM error until the systematic gate error rises above the level (0.01) of these 
errors. We expect GST, on the other hand, to be insensitive to intrinsic SPAM but to 
also be unable to detect the systematic error unless it is larger than the sampling error. 

The results turn out to be more subtle. Depending on the type of error, it may be de¬ 
tectable at a lower gate error than the sampling error. This is the case for over-rotations. 
Fig. 14.101 shows the estimation error vs gate error for our example. As expected, the 
estimation error is flat until the gate error exceeds 0.01, both for QPT and GST. QPT is 
limited by intrinsic SPAM while GST is not. Above 0.01, the estimation error behaves 
as in Fig. 14.11 - the QPT values increase while GST remains flat. However, the PTMs 
give a fuller picture. Figs. 14.111 and 14.121 show the PTMs for a gate error near 10“^, 
well below the crossover point. GST is still able to find the over-rotation error. This 
is because, for a given gate error, the magnitude of non-zero PTM entries due to the 
over-rotation error is larger than the average magnitude of PTM entries for sampling 
noise. 

The results in this section were generated by numerical optimization in Matlab, as 
in Sec. 14.4.11 Everything stated in that section regarding the optimization carries over 
to here, except that the objective function used was weighted least squares, and the 
objective function tolerance was set to options.TolFun = 10“®. The reason is that in 
the presence of sampling noise, the weighted-least squares objective function is equal to 
the chi-squared, which is of order Wxp- Without sampling noise, the unweighted least 
squares objective function is near zero, hence the smaller function tolerance in that case. 
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Figure 4.7: Estimation error vs gate error for depolarizing noise on the initial state. 
Blue dots are ML-GST and green dots are ML-QPT. Estimation error is given by the 
infidelity as in Fig. 14.11 Initial state error is equal to {{E\p)) in the present case, where 
((El and \p)) are orthogonal. The vertical dashed line indicates the selected value of 
state error {E = 8.5 x 10“'^) for which the PTMs are plotted on the following pages. 
This is the same error magnitude that was selected for the over-rotation example above, 
and corresponds to what the infidelity would be if the depolarizing noise was attributed 
to the gate rather than the state. 
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Figure 4.8: Pauli transfer matrices for GST and QPT maximum likelihood estimates 
(2nd and 3rd columns). The actual gate set (1st column) contains no error. The error 
is a depolarizing noise of strength {{E\p)) = 8.5 x 10“^ applied to the initial state. 
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Figure 4.9: Pauli transfer matrices for GST and QPT maximum likelihood estimates 
(2nd and 3rd columns) for the same data as in Fig. 14.81 with actual PTMs (column 1) 
subtracted off. The error is a depolarizing noise of strength (( E\p)) = 8.5 x 10“^ applied 
to the initial state. QPT incorrectly attributes a depolarization error to all gates. 
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Figure 4.10: Estimation error vs gate error for over-rotation in the Y-gate, including 
sampling noise and intrinsic SPAM errors with parameters Nsampies = 10000, (( E\p)) = 
0.01. Blue dots are ML-GST and green dots are ML-QPT. Estimation error is given by 
the infidelity as in Fig. 14.11 The vertical dashed line indicates the selected value of gate 
error {E = 8.5 x 10“^) for which the PTMs are plotted on the following pages. 


48 









actual 


ML-GST 


QPT 



Figure 4.11: Pauli transfer matrices for GST and QPT maximum likelihood estimates 
(2nd and 3rd columns) of the gate set in column 1. The actual gate set shown in column 
1 contains an over-rotation error of 4° in the Y-gate, corresponding to a gate error of 
8.5 X 10“^. Also input to the estimation were sampling noise and intrinsic SPAM errors 
with parameters Nsampies = 10000, {{E\p)) = 0.01. 
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Figure 4.12; Pauli transfer matrices for GST and QPT maximum likelihood estimates 
(2nd and 3rd columns) for the same data as in Fig. 14.111 with actual PTMs (column 1) 
subtracted off. The actual gate set (1st column) contains an over-rotation error of 4° in 
the Y-gate, corresponding to a gate error of 8.5 x 10“^. Also input to the estimation 
were sampling noise and intrinsic SPAM errors with parameters Nsampies = 10000, 
((i?|p))=0.01. 
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Chapter 5 

Summary and outlook 


We have seen that gate set tomography is a robust and powerful tool for the full charac¬ 
terization of quantum gates. In the presence of SPAM errors, GST is accurate (within 
the limits of sampling error) while QPT typically overestimates the gate error. This 
discrepancy can be several orders of magnitude in the regime of gate error relevant 
to quantum error correction. GST is also capable of providing qualitative information 
about systematic gate errors - via the estimated Pauli transfer matrices - that is not 
accessible by QPT in the presence of SPAM error. Nevertheless, several topics remain 
the subject of current research. These include reduction of sampling error, treatment of 
non-Markovian noise, detection of leakage and extra dimensions, and tractable extension 
to multiple qubits. 

Of these, the extension to multiple qubits seems the most clear-cut. For multiple 
qubits, the formalism of GST is the same but there is a formidable processing challenge 
because of the large amount of experimental data necessary (over 4,000 experiments for 
2-qubit GST, as compared to < 100 for 1-qubit). The nonlinear optimization routines we 
used to illustrate GST in Ch. [4] would likely take an unreasonable amount of time even 
for 2 qubits, assuming they run at all. The SDP method discussed in Ch. [3] has been 
reported to run slowly for 2 qubits as well 10|. One solution to this problem has been 
proposed by IBM, which is to use a type of semidefinite program used for compressed- 
sensing problems, called a first-order conic solver. Preliminary reports indicate that this 
runs much faster than standard SDP [l^ . 

Robin Blume-Kohout has proposed the inclusion of gate repetitions in the GST gate 
set in order to reduce sampling error and detect non-Markovian noise 1]|. Some initial 
evidence of the usefulness of this approach was presented in Ref. 0]. The detection of 
extra dimensions ~ i.e. due to the system the leaving the qubit Hilbert space - is also 
an active topic of research 


11|,14Q|. 


One topic we have not discussed in detail but that deserves fuller attention is estimat¬ 
ing the error in the GST estimates. This is important in order to determine the resource 
requirements (number of experiments, gate repetitions, etc.) for obtaining high-quality 
GST estimates within the tolerances required by QEC. It is a pressing question whether 
the required accuracy can be tractably obtained for two-qubit gates. E.g., for a GNOT 
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gate below the surface code threshold - probability of gate error ~ 10“^ - we would like 
the error in the GST estimate of this quantity to be below roughly 10“^. 

Part of the error in the maximum-likelihood estimate comes from sampling noise, part 
from any approximations used in the objective function, and part from the optimization 
routine (which caused the point-to-point variability in the plots in Ch.[3]of estimation vs 
gate error in the absence of sampling error). Neglecting errors due to the optimization 
itself, it is important to understand how statistical sampling errors propagate through 


the MLE procedure. Standard errors for MLE estimates (see, e.g. Ref. Ch. 14) are 


valid when these estimates can be shown to be asymptotically (limit of large number of 
experiments) normally distributed. In this case, an asymptotically valid estimator for 
the error can be written in terms of the covariance matrix of the estimated parameter 


vector t, which can be calculated from the MLE estimate and the objective function. 
However, the MLE estimates for GST may not be asymptotically normally distributed 
due to the constraint that t must describe a physical map. In fact, for very small errors - 
the regime we are interested in - the gates will be very close to ideal unitaries, which lie 
on the boundary of allowable maps. Therefore asymptotic normality cannot be assumed. 
Hence it is unclear how accurate this approach would be for our problem. 

A commonly used approach to estimating the error in the ML estimate is to 
resample from experimental data, known as bootstrapping. In Gh. 01 we were able 
to plot estimation error vs gate error because we knew what the actual gates were. 
However, the goal of GST is to estimate an unknown set of gates. Operationally, the 
variance in the estimate can be found by repeating the same experiment many times, 
each time generating new data and a new best ht. Since the amount of work to do this 
can be impractical (and if it isn’t one would prefer to include this additional data in a 
single, larger sample to produce a tighter estimate), resampling with replacement from 
the same data (say a set of single-shot measurements of size NSamples) seems like a good 
alternative. Unfortunately, it is well known [i^ that bootstrapping is unreliable for 
biased estimators. As discussed in the previous paragraph, the MLE estimate is biased 
due to physicality constraints. Therefore, without a rigorous theory of error estimation 
for quantum tomography in the presence of sampling noise, it is impossible to evaluate 
the validity of bootstrapping for this problem. 

A practical solution to both of these problems (validity of standard errors for MLE, 
validity of bootstrapping) may be possible via a Monte Garlo approach. One can nu¬ 
merically generate many sample data sets with a specified error model and a given level 
of sampling noise, as we have done in Gh. 01 One can then perform GST on each sample 
data set and study the distribution of the resulting estimates. It is possible that general 
features, such as the amount of bias in the GST estimates for a given model of systematic 
errors, may be obtained in this way. This can then be used to determine under what 
conditions techniques such as bootstrapping are justified. 

In conclusion, we have presented an overview of gate set tomography for a single 
qubit. It is hoped that this will be useful to practitioners aiming to implement full 
characterization of single as well as multi-qubit gates. 


52 



Chapter 6 

Acknowledgements 


I would like to thank Robin Blume-Kohout, Jay Gambetta, Easwar Magesan, Andrew 
Skinner, Yudan Guo, Ben Palmer, Tim Sweeney, Ramesh Bhandari, and Michael Man- 
delberg for helpful discussions and input. 


53 



Bibliography 


[1] D. Leibfried, D. M. Meekhof, B. E. King, C. Monroe, W. M. 
Itano, and D. J. Wineland, Phys. Rev. Lett. 77, 4281 (1996), URL 
http://link.aps.org/doi/10.1103/PhysRevLett.77.4281, 

[2] 1. L. Chuang and M. A. Nielsen, Journal of Mod¬ 
ern Optics 44, 2455 (1997), ISSN 0950-0340, URL 

http://www.tandfonline.com/doi/abs/10.1080/09500349708231894, 

[3] J. F. Poyatos, J. 1. Cirac, and P. Zoller, Phys. Rev. Lett. 78, 390 (1997), URL 
http://link.aps.org/doi/10.1103/PhysRevLett.78.390, 

[4] E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost, C. Langer, 
R. Ozeri, S. Seidelin, and D. J. Wineland, Phys. Rev. A 77, 012307 (2008), URL 
http://link.aps.org/doi/10.1103/PhysRevA.77.012307, 

[5] C. A. Ryan, M. Laforest, and R. Laflamme, New J. Phys. 11, 013034 (2009), ISSN 
1367-2630, URL http://iopscience.iop.Org/1367-2630/ll/l/013034. 

[6] E. Magesan, J. M. Gambetta, and J. Emerson, Phys. Rev. Lett. 106, 180504 (2011), 
URL http://link.aps.org/doi/10.1103/PhysRevLett.106.180504, 

[7] E. Magesan, J. M. Gambetta, and J. Emerson, Phys. Rev. A 85, 042311 (2012), 
URL http://link.aps.org/doi/10.1103/PhysRevA.85.042311. 

[8] S. T. Merkel, J. M. Gambetta, J. A. Smolin, S. Poletto, A. D. Corcoles, B. R. 
Johnson, G. A. Ryan, and M. Steffen, Phys. Rev. A 87, 062119 (2013). 

[9] R. Blume-Kohout, J. K. Gamble, E. Nielsen, J. Mizrahi, J. D. Sterk, and P. Maunz 
(2013), ArXiv:1310.4492, 

[10] J. M. Gambetta and E. Magesan, private communication. 

[11] R. Blume-Kohout, private communication. 

[12] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Wootters, Phys. Rev. A 
54, 3824 (1996), URL http://link.aps.org/doi/10.1103/PhysRevA.54.3824. 


54 



[13] C. Dankert, R. Cleve, J. Emerson, and E. Livine, Phys. Rev. A 80, 012304 (2009), 
URL http://link.aps.org/doi/10.1103/PhysRevA.80.012304. 

[14] S. Kimmel, M. P. da Silva, C. A. Ryan, B. R. Johnson, and T. Ohki, Phys. Rev. X 
4, 011050 (2014), URL http: //link. aps . org/doi/10.1103/PhysRevX.4.011050, 

[15] A. C. Dugas, J. Wallman, and J. Emerson, arXiv: 1508.06312 [quant-ph] (2015), 
arXiv: 1508.06312, URL http://arxiv.org/abs/1508.06312, 

[16] J. J. Wallman, M. Barnhill, and J. Emerson, Phys. Rev. Lett. 115, 060501 (2015), 
URL http://link.aps.org/doi/10.1103/PhysRevLett.115.060501. 

[17] J. Wallman, C. Granade, R. Harper, and S. T. Elammia, arXiv:1503.07865 [quant- 
ph] (2015), arXiv: 1503.07865, URL http://arxiv.org/abs/1503.07865, 

[18] E. Magesan, D. Puzzuoli, C. E. Granade, and D. G. Cory, Phys. Rev. A 87, 012324 
(2013), URL http://link.aps.org/doi/10.1103/PhysRevA.87.012324. 

[19] M. Gutierrez, L. Svec, A. Vargo, and K. R. Brown, Phys. Rev. A 87, 030302 (2013), 
URL http://link.aps.org/doi/10.1103/PhysRevA.87.030302. 

[20] D. Puzzuoli, C. Granade, H. Haas, B. Criger, E. Mage¬ 
san, and D. G. Cory, Phys. Rev. A 89, 022306 (2014), URL 

http://link.aps.org/doi/10.1103/PhysRevA.89.022306, 

[21] J. P. Gaebler, A. M. Meier, T. R. Tan, R. Bowler, Y. Lin, D. Hanneke, J. D. dost, 
J. P. Home, E. Knill, D. Leibfried, et ah, Phys. Rev. Lett. 108, 260503 (2012), URL 
http://link.aps.org/doi/10.1103/PhysRevLett.108.260503. 

[22] R. Barends, J. Kelly, A. Megrant, A. Veitia, D. Sank, E. Jef¬ 
frey, T. C. White, J. Mutus, A. G. Eowler, B. Camp¬ 
bell, et ah. Nature 508, 500 (2014), ISSN 0028-0836, URL 

http://www.nature.com/nature/journal/v508/n7497/full/nature13171.html. 

[23] J. Kelly, R. Barends, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, 

A. G. Eowler, I.-C. Hoi, E. Jeffrey, et ah. Physical Review Letters 112 (2014), ISSN 
0031-9007, 1079-7114, arXiv: 1403.0035, URL http://arxiv.org/abs/1403.0035. 

[24] J. T. Muhonen, A. Laucht, S. Simmons, J. P. Dehollain, R. Kalra, P. E. 
Hudson, S. Ereer, K. M. Itoh, D. N. Jamieson, J. C. McCallum, et ah, 
arXiv: 1410.2338 [cond-mat, physics:quant-ph] (2014), arXiv: 1410.2338, URL 
http://arxiv.org/abs/1410.2338. 

[25] J. M. Chow, J. M. Gambetta, L. Tornberg, J. Koch, L. S. Bishop, A. A. Houck, 

B. R. Johnson, L. Prunzio, S. M. Girvin, and R. J. Schoelkopf, Phys. Rev. Lett. 
102, 090502 (2009), supplementary material. 


55 



[26] J. M. Chow, J. M. Gambetta, A. D. Corcoles, S. T. Merkel, J. A. Smolin, C. Rigetti, 
S. Poletto, G. A. Keefe, M. B. Rothwell, J. R. Rozen, et ah, Phys. Rev. Lett. 109, 
060501 (2012), supplementary material. 

[27] R. Bhandari and N. A. Peters, arXiv: 1502.01016 [quant-ph] (2015). 

[28] M. A. Nielsen and I. I. Ghuang, Quantum Computation and Quantum Information 
(Gambridge University Press, 2000). 

[29] S. Haroche and J.-M. Raimond, Exploring the Quantum: Atoms, Cavities and Pho¬ 
tons (Oxford University Press, 2006). 

[30] A. Y. Kitaev, A. H. Shen, and M. N. Vyalyi, Classical and Quantum Computation 
(American Mathematical Society, 2002). 

[31] E. Magesan, R. Blume-Kohout, and J. Emerson, Phys. Rev. A 84, 012309 (2011), 
URL http://link.aps.org/doi/10.1103/PhysRevA.84.012309, 

[32] M.-D. Choi, Linear Algebra and its Applica¬ 
tions 10, 285 (1975), ISSN 0024-3795, URL 

http: //www. sciencedirect.com/science/article/pii/0024379575900750, 

[33] A. Jamiolkowski, Reports on Mathematical Physics 3, 275 (1972), ISSN 0034-4877, 
URL http: //www. sciencedirect.com/science/article/pii/0034487772900110 

[34] C. Stark, Phys. Rev. A 89, 052109 (2014), URL 

http://link.aps.org/doi/10.1103/PhysRevA.89.052109, 

[35] S. Boyd and L. Vandenberghe, Convex Optimization (Cambridge University Press, 
2004). 

[36] Y. Guo, private communication. 

[37] R. Blume-Kohout, New J. Phys. 12, 043034 (2010). 

[38] J. M. Chow, J. M. Gambetta, A. D. Gorcoles, S. T. Merkel, J. A. Smolin, C. Rigetti, 
S. Poletto, G. A. Keefe, M. B. Rothwell, J. R. Rozen, et ah, Phys. Rev. Lett. 109, 
060501 (2012). 

[39] M. A. Nielsen, Physics Letters A 303, 249 (2002), ISSN 0375-9601, URL 
http: //www. sciencedirect.com/science/article/pii/S0375960102012720, 

[40] C. J. Stark and A. W. Harrow, arXiv: 1412.7437 [quant-ph] (2014), arXiv: 1412.7437, 
URL http: //arxiv. org/abs/1412.7437, 

[41] W. H. Greene, Econometric Analysis (Prentice Hall, 2011), 7th ed. 

[42] R. Blume-Kohout, arXiv: 1202.5270 [quant-ph] (2012), arXiv: 1202.5270, URL 
http://arxiv.org/abs/1202.5270, 


56 


